Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic test CALeDNA record for DwC and ABCD #62

Open
gdadade opened this issue Mar 22, 2021 · 5 comments
Open

Basic test CALeDNA record for DwC and ABCD #62

gdadade opened this issue Mar 22, 2021 · 5 comments

Comments

@gdadade
Copy link
Collaborator

gdadade commented Mar 22, 2021

I did a test mapping using the event core for one environmental sample record with 10 example identifications from GGBN's partner CALeDNA.

Thoughts for dicussion:

  1. Besides of the missing BasisOfRecord in the event core, there are also following important collection terms missing: collectionCode, catalogNumber
  2. Which term should I use when providing the collectionObjectGUID, I guess materialSamplID which is also not included in the event core?
  3. Since I want to add scientific name information I'd rather use the identification extension, but unfortunately one cannot use it with the event core; also I don't know if this extension supports preferred=true for ALL identifications related to one event?
  4. I've used occurrence extension as suggested by GBIF but many fields are redundant to the event core, plus in case I want to use collectionCode etc. here I have to duplicate them a million times for each scientific name; to overcome this I wanted to use both occurrence and identification extension, but see above
  5. Since this is a basic test, I did not include the Resource Relationship Extension yet
  6. How to add a sequence for each scientific name? Since the identification is based on BLAST this is important information.
  7. Note: I did not use the dna derived data extension yet, as all parameters needed for this test mapping exist in GGBN extensions already.

dwca-caledna_test-v1.1.zip

In comparison see ABCD file for same test record

  1. Basic collection object information provided only once
  2. Scientific Names mapped in the identification class where they belong to
  3. PreferredFlag for identifications not used -> preferred = true for all (default, supported by BioCASe and GGBN)
  4. Sequences can be added to each scientific name when using the GGBN extension
  5. Note: no GGBN extension and UnitAssociation used for this basic mapping; we use ABCD2.1 (and interim version for GGBN until ABCD3.0 is ready for usage). I can't upload xml files, so I zipped the ABCD

calednaabcdggbn.zip

@thomasstjerne
Copy link
Collaborator

In DwC I would use Occurrence core rather than Event core in order to be able to attach DNA sequences to the identifications through either the GGBN amplification extension or the DNA derived data extension (once it is available)
This will of course this will duplicate Event fields across Occurrence rows, but otherwise you can´t link the DNA sequences.

In the Occurrence core file I would have the following terms:

  • occurrenceID
  • eventID
  • materialSampleID
  • institutionCode
  • collectionCode
  • basisOfRecord
  • catalogNumber
  • scientificName
  • phylum
  • class
  • order
  • family
  • country
  • countryCode
  • locality
  • minimumElevationInMeters
  • decimalLatitude
  • decimalLongitude
  • preparations

@gdadade
Copy link
Collaborator Author

gdadade commented Apr 29, 2021

So if I were to use the same eventID for alle "occurrences" that belong together GBIF would recognize this as an event sampling? If so, why do we need an event core than?
Still this would mean I have to double primary occurrence data one million times if I have one million taxa in a sample.

@thomasstjerne
Copy link
Collaborator

Yes, that would be recognized as an event. Example

The reason we need Event core is that we loose the information of parent events when flattening event data to Occurrences. In a future richer model than the Star schema, we want to be able to model hierarchical events.

But for now you can only choose one Core, i.e. do you want rich occurrences with DNA sequences or do you want to avoid data duplication and use Event core.

@gdadade
Copy link
Collaborator Author

gdadade commented Apr 30, 2021

Ok thanks. If I click on "77 occurrences" it takes me to occurrences, but the parameter "event_id" disappears from url and all 30.091 occurrences are shown. Is this not yet implemented?

thomasstjerne added a commit to gbif/portal16 that referenced this issue May 3, 2021
@thomasstjerne
Copy link
Collaborator

That was a bug in the portal - fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants