CF Roadmap: Kicking off work on Provenance and Lineage theme #389
Replies: 9 comments 19 replies
-
Does "provenance and lineage" mean discovery metadata? |
Beta Was this translation helpful? Give feedback.
-
No, this is focused on being able to inspect data and find out how it was made. |
Beta Was this translation helpful? Give feedback.
-
Would this possibly be a revival of work on ACDD? I've also come across W3C's PROV a few times. |
Beta Was this translation helpful? Give feedback.
-
Is it related to (The present discussion so far reads a bit like Twenty questions. 😃) |
Beta Was this translation helpful? Give feedback.
-
My understanding is that many of the mentioned issues and external resources have some overlap. I think that Daniel @erget captures it well:
At this general level the W3C's PROV that Andrew @DocOtak points at is the comprehensive resource. The abstract of the web page states: For users requiring this kind all-embracing machinery, maybe the best thing CF can do is to point at PROV in the conventions document, and by including a suitable string in the Personally I think that the 2024 CF Workshop presentations by David Huard here, in the Uncertainty session, and by José Manuel Gutiérrez here, in the Statistical Processing session, offers a lot of food for thought. Many users do not need e.g. manual quality judgement, expert group voting, and much more, but still do need something more than current
I think that by enhancing the cell method machinery we can cover several existing use cases, but I also think that we need to come up with something new that complements cell methods to cover even more complex use cases and data manipulation/processing. I have for some time been playing with the idea of creating some kind of "pseudo-language". That is, something similar to what is now used for parametric vertical coordinates but with freedom to also describe the equations. Well, still just some wild ideas without much substance behind .... |
Beta Was this translation helpful? Give feedback.
-
Hi folks, particularly @sethmcg and @pagecp - the survey has spoken, unfortunately there's no appointment for the near-term that fits for all of us but @sethmcg we'll fill you in and hopefully you can join later. Anybdoy who wants on the invite let me know and I'll add you. @pagecp you're already on it. It'll be 2024-11-27T13:00Z on Teams. |
Beta Was this translation helpful? Give feedback.
-
Hi folks, we had an appointment last year to roadmap what we want to do with provenance and lineage... In the end it was a veritable echo chamber - I was the only one there! 😱 Do we want to try it again? Mark your availability here by EOB on 12 Feb (next Wednesday) and we'll set a date. |
Beta Was this translation helpful? Give feedback.
-
We met yesterday (sorry @sethmcg for leaving you out in the dark, that was definitely not intentional and it's my fault we missed out on you 😥) and had a first meeting to discuss our intents. We are meeting again on 12 March at 16 CET - if you aren't on the invite and would like to be, let me know and I'll add you. What use cases did we discuss?We identified the following reasons why somebody using CF data may want to have provenance baked in:
Not all of these use cases need to be fulfilled for things to be useful. The W3C PROV standard provides tools that service all of them. What do we propose incorporating into the CF Roadmap?In this order:
Any design principles we want to propose?We want to re-use PROV rather than adapt it or invent something, because using an existing standard has a lot of advantages. The issue that we see here is how to represent the provenance data in a way that works with CF so we don't have a multutide of implementations. What are we doing between now and the next meeting?I made some technical documentation available to the people in the meeting to mull over. This is related to a prototype implementation that we had at EUMETSAT for inspiration, it's not something that we could just take off the shelf and use. So we're thinking about
In particular, Lars kindly volunteered to attempt to provide an overview of uses of PROV in IPCC and other settings, this may or not be ready at that point. Looking forward to seeing you all at the next meeting :) |
Beta Was this translation helpful? Give feedback.
-
We discussed the use case
I'd like to add that this use case could well cover my use case of recording the uncertainty of data used at different parts of the processing chain, assuming that each contributing dataset includes its own uncertainty description (which is not in any way a part of this discussion!). |
Beta Was this translation helpful? Give feedback.
-
Topic for discussion
Do you have ideas around data provenance and lineage, and would you like to be involved in ensuring that these themes are represented in a way that's beneficial to the CF Roadmap? This is the discussion for you.
At this year's CF Workshop we divided the roadmap preparations into multiple themes. This is one such theme. As you can see from the table, right now we're working on a green field with a blue sky - nothing really more than 2 keywords to start with. Behind that are a whole bunch of ideas and opportunites to shape where we go with it.
I'm setting up a call for later this month. If you want to be involved, please indicate your availability in this survey by the end of 14 Nov 2024.
Topics:
From there we can agree how to organise the work moving forward.
If you're not available on those dates but still want to be involved, that's no problem - let me know in the comments or drop me a mail and I'll keep you looped in :)
@pagecp
Beta Was this translation helpful? Give feedback.
All reactions