Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Automate the generation of the examples #123

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Robsteranium
Copy link
Contributor

It'd be nice to use the new csvw and exec tasks from #120 to regenerate the examples. This will help keep them up-to-date and also serves as a test in it's own right (firstly whether they can be generated successfully and secondly because the examples are used by the test suite).

This PR provides an example script. This involves a few changes:

  • pipeline outputs are renamed as they otherwise clobber each other's "metadata.json" files
  • provision for having the csvw:url specified relatively so the checked-in examples don't contain references to absolute paths on the machine that generated them (see util/csvw-url)
  • introduce a basic script that calls the uberjar to generate the csvw for the regional-trade example

Outstanding issues for discussion

Does util/csvw-url work in a reasonable way?

csv2rdf expects the csvw:url to specified in one of three ways:

  • file:/absolute/path/input.csv
  • input.csv (relative to the location of the json metadata file)
  • file:./relative/path/input.csv (relative to JVM working dir - so you would need to call table2qb and csv2rdf from the same place)

If the output-directory parameter is specified absolutely then the first approach is taken, if relatively then the second. Arguably, the third (file:./relative/path/input.csv) would be more consistent (also having a "file" schema), however java.net.URI doesn't seem to like creating relative URIs for files (even though csv2rdf happily reads this).

Moreover explicit configuration might be preferred as a convention like this might be a bit magic/ mysterious.

In any case we ought to document this behaviour somewhere.

How should we call table2qb to generate the examples?

I think it's appealing to use the CLI to do this as means we're exercising that interface. Shell scripts mightn't be ideal, particularly once we extend this to all of the examples. Further we rely on developers remembering to run the scripts and commit the changes when the inputs to the examples change.

An alternative would be specify the calls as part of the test suite. It would be more convenient to specify lots of examples/ parameters with data structures in clojure. We can also hook into the test cycle so that the examples are regenerated every time the suite runs. We'd loose the checks we otherwise having on the CLI interface.

Further requirements for implementation

  • extend the generation tasks to all examples
  • generate turtle as well as csvw
  • ensure the re-generation runs before the test suite (on CI at least)

Adds a script to regenerate the csvw for this example
Changes some output filenames to avoid clobbering between pipelines
Introduces util/csvw-url function which allows csvw:urls to be specified relatively (i.e. csv file relative to location of json) so the examples don't have absolute paths
Minor consistency edits to tests
@@ -1,9 +1,9 @@
component_slug,component_attachment,component_property
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is basically just an ordering change

Label,Notation,Parent Notation
Export,export,
Import,import,
label,notation,parent_notation,sort_priority,description,top_concept_of,has_top_concept,pref_label
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we see the benefit of doing this. We didn't keep this file up-to-date as these extra columns were introduced.

(let [metadata-file (example-csvw example-directory (str example-directory "-metadata.json"))
schema (util/read-json metadata-file)
part-number (case part-name
"dataset" 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordinal matching isn't particularly robust, but it's at least simpler than matching by e.g. the aboutURL template.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant