Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to make "provenance queries?" #30

Open
olyerickson opened this issue Dec 8, 2015 · 1 comment
Open

How to make "provenance queries?" #30

olyerickson opened this issue Dec 8, 2015 · 1 comment

Comments

@olyerickson
Copy link

The title says it all; it is not clear how we make the provenance "queries" or script run reports as discussed in Section 4 of, "Retrospective Provenance Without a Runtime Provenance Recorder" (McPhillips, et.al.)...

@tmcphillips
Copy link
Member

What one currently does is export datalog or prolog compatible facts that describe the YW annotations extracted from the source files, the workflow model that YW constructs based on these annotations, and any reconstructed retrospective provenance of products of a script run. You then write queries of these facts in datalog or prolog.

The src/main/resources/examples/simulate_data_collection/yw/xsb directory includes the three facts files for a run of the simulate_data_collection.py script from the paper you mention, a file with general rules for use in queries, and three files containing the provenance queries themselves (e.g. recon_queries.P includes the queries included in the paper). The run_queries.sh script runs all these queries using xsb, and the run_queries.txt file contains the expected output. You can reproduce these results by installing xsb and running the script (alternatively you can do all this using DLV with the analogous files in the dlv directory.)

Eventually we'll probably want to build a query engine into YW itself so that these additional tool installations aren't required. We'll also want a streamlined query language of some kind so that writing new queries is a lot easier (and shorter). You can imagine such streamlined queries being replaced by their results during expansion of some kind of report template to produce a run report. Plenty of work to do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants