Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dwc:preparations #26

Open
hollyel opened this issue Oct 24, 2018 · 3 comments
Open

dwc:preparations #26

hollyel opened this issue Oct 24, 2018 · 3 comments
Labels
Occurrence used to denote issues related to terms in the DwC Occurrence class

Comments

@hollyel
Copy link
Collaborator

hollyel commented Oct 24, 2018

http://rs.tdwg.org/dwc/terms/#preparations

I will add a distinct values list for this term pulled from GBIF where basisOfRecord = fossilSpecimen. Currently across all fossil data in GBIF this term has about 86,000 unique values (and only used in about 30% of fossilSpecimen records). It is suggested to have a controlled vocab, but the definition for the term does not provide enough detail to make that possible. The main problem for paleo is that we use it for a broad range of information. In short the information falls into:

  • Material Type
  • Prep work done
  • By % complete or action
  • Anatomy/Morphology
  • Internal notations of work done (aka cannot be used or understood externally like “PREP 1”)

To better serve paleo data we will need to establish a more detailed definition for this term and most likely proposed new terms to cover other details (e.g. NMNH would like new terms for morphology/anatomy)

@hollyel hollyel added the Occurrence used to denote issues related to terms in the DwC Occurrence class label Oct 24, 2018
@mjcollin
Copy link

I can help with the basics of distinct values. Our API can do them live, but only the top 5000:

http://search.idigbio.org/v2/summary/top/records?rq={%22basisofrecord%22:%22fossilspecimen%22}&top_fields=[%22data.dwc:preparations%22]&count=1000

There's a github repo with some existing files:

https://github.com/tdwg/dwc-qa/tree/master/data

We had a workflow in place to generate them automatically but it hasn't been hooked up to run permanently.

@hollyel
Copy link
Collaborator Author

hollyel commented Oct 24, 2018

Thanks, @mjcollin! yeah I was inspired by the ones in the dwc-qa repo, but wanted to get an idea of what these lists looked like just for paleo occurrences. The lists I have right now also break out the values by data publisher, so they need to be summarized again. (having data publisher helped with identifying patterns for the types of values)

@debpaul
Copy link
Collaborator

debpaul commented Oct 24, 2018

Hello @hollyel keep in mind the (my) hope is that we someday have a resource that would allow you to create these sorts of requests via an API adding the limits that you need. So while these examples are distinct values (with counts) across the entire aggregated dataset -- you want (and me too) to limit taxonomically to records where dwc:basisOfRecord=FossilSpecimen, for example. And as you've suggested, you can do this other ways including by Publisher, as you have done. And visualizing these (I believe) will go a long way toward engaging collectors / curators / collection and data managers to work together - much as you are trying to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Occurrence used to denote issues related to terms in the DwC Occurrence class
Projects
None yet
Development

No branches or pull requests

3 participants