Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is best practice for assigning globally unique IDs? #13

Open
dennereed opened this issue Mar 8, 2017 · 17 comments
Open

What is best practice for assigning globally unique IDs? #13

dennereed opened this issue Mar 8, 2017 · 17 comments

Comments

@dennereed
Copy link
Contributor

What recommendation should we give for assigning globally unique ID numbers to fossil specimens. SESAR?

@DimEvil
Copy link

DimEvil commented Mar 9, 2017

as long as they are globally unique, it's ok. In INBO we use this setup
[occurrenceID] = N'INBO:VLINDERS:' + Right('000000000' + CONVERT(nvarchar(20),tMt.WRME_ID),8)

Where fixed is INBO (our institute)
VLINDERS (shortName for the dataset)
tMt.WRME_ID (the unique ID for a record within the dataset)

--> INBO:VLINDERS:00989254

(from other databases we have things like this: INBO:NBN:BFN0017900009ZWX where BFN0017900009ZWX is unique within the database))

It's also possible to just generate UniqueId's: https://www.guidgenerator.com/online-guid-generator.aspx
There is pro and contra for using generated Unique ID's, mostly in the human readable aspect.

@falkogloeckler
Copy link

What recommendation should we give for assigning globally unique ID numbers to fossil specimens.

A recommendation would be to adopt a agreed standard like the CETAF members did: http://cetaf.org/cetaf-stable-identifiers
See also our recent publication https://doi.org/10.1093/database/bax003

@dennereed
Copy link
Contributor Author

Dimitri's suggestion matches the recommended best practice outlined in DwC for occurrenceID in the absence of a guaranteed GUID, which is to concatenate institutionCode + collectionCode + catalogNumber. The downside is that there is no guarantee of truly unique id.

Falko's suggestion to follow CETAF means generating stable URIs for each specimen in accordance with W3 linked data best practice.

The question then becomes, what is the best recommendation to paleobiologists, from researchers to institutions, on how to establish reliable, and persistent URIs. Anyone out there who can comment, or has experience generating stable URIs for collections?

@hollyel
Copy link
Collaborator

hollyel commented Mar 10, 2017

I think a guideline for a best practice is the best path to go down. It would be incredibly difficult to get everyone to use the same type of GUID. From my understanding, most institutions that are currently sharing data should already be generating GUIDs per record as they are required by some of the aggregators/portals.

At the NMNH we generate a GUID per specimen record and will soon being adding GUIDs to multimedia objects as well. We use the EZID resolving service with UUID tail automatically generated in our collections management system (EMu). That string then gets attached to the EZID shoulder that is specific to our museum name ID. The ID resolves through the EZID service, which bounces it back to a NMNH server.

Example: http://n2t.net/ark:/65665/3f693ef93-8ecc-4a3b-a376-fd0520be555d

@DimEvil
Copy link

DimEvil commented Mar 10, 2017

It would indeed be great to make GUID's always resolvable, but that is another issue. And we do not need everybody to use the same type of GUID (indeed practically impossible :) ) as long as the GIUD's are indeed unique they can be used for simple data publishing.

So, I would recommend to make the ID's globally unique by using a series of prefixes, followed by the uniquerecord ID or use a GUID generator or GIUD service (and make sure these unique GUID are also available in the database or at least that you can make the connection between the published record and the record in the database)

@debpaul
Copy link
Collaborator

debpaul commented Mar 10, 2017

Hi @dennereed I note you say (in your first part of this ticket)...

globally unique ID numbers to fossil specimens.

Then you bring in

best recommendation to paleobiologists, from researchers to institutions, on how to establish reliable, and persistent URIs.

So, I think you get there's a difference between generating a GUID and coming up with a URI, right? A URI is a string that is unique, hence it can act as a type of GUID. A URI may also "resolve" (be a URL), but doesn't have to. But GUIDs do not have to be URIs, they can be UUIDs for example.

  • For projects that are using Symbiota, for instance, the software is generating a GUID in the form of a UUID for each record object in the database - to be used as the dwc:occurrenceID on export of data for sharing with aggregators.
  • Whether or not that occurrenceID ends up in a URI is different.
  • And then there's the question of if you want a URI as your Unique Identifier Pattern (rather than a naked UUID),
  • and do you want said URI to resolve, as @hollyel suggests. And as you hint you might be looking for because you mention URIs in your second comment.

Some ideas to consider:

  • See: https://www.idigbio.org/wiki/index.php/Data_Ingestion_Guidance#Specimen_metadata_-_GUIDs_.2F_identifiers_.28occurrenceID.29
  • When assigning GUIDs to fossil specimens, first I might ask you if you are talking in-the-field as you discover new specimens, or in the museum (or both)?
  • Best practice is to assign / associate them with the specimen/s in-the-field. You can take GUIDs ready-to-go into the field that can be associated with each find. This would be planned as part of a research trip - you'd discuss with the place you're planning to deposit the specimens - as to the identifiers (GUIDs) they prefer. So, this would need to be considered in any URI scheme you're looking to set up.
  • I assume with fossils, you also want to plan some sort of mechanism to associate that all of a particular set of fossils were discovered at the same event, so you need an EventID and then all the GUIDs for each fossil are related to that EventID.
  • I assume you also have fossils in pieces, but from the same organism, and you're also going to want an ID to bind these all together. The people digitizing mammal data have similar issues - and may have ideas for you about best practices and use case issues for assigning GUIDs. @DerekSikes can you comment on @dennereed question about assigning GUIDs, building URIs?
  • Another recommendation / best practice, never ever re-use a GUID. If for some reason the fossil is lost, destroyed, disappears, please do not take the GUID that was assigned to it and re-use for another fossil.

@debpaul
Copy link
Collaborator

debpaul commented Mar 10, 2017

Note the EZID system that @hollyel suggests is also nice because if for some reason (and there will usually be one) the domain name changes, you can update your information in the EZID system and the resolver service will then make sure the "old" URIs resolve (find) the associated data in the new place.

@debpaul
Copy link
Collaborator

debpaul commented Mar 10, 2017

Oh @dennereed, what is SESAR?

@dennereed
Copy link
Contributor Author

@debpaul SESAR is the System for Earth Sample Registry. Its a service for generating GUIDs for geological specimens.

@debpaul
Copy link
Collaborator

debpaul commented Mar 10, 2017

aha yes @dennereed , IGSN, this acronym I know :-)

@debpaul
Copy link
Collaborator

debpaul commented Mar 10, 2017

I think the IGSN is very robust and may meet your needs very well. Does IGSN suit you as a researcher? Would paleo collections adopt?

@dennereed
Copy link
Contributor Author

Thanks everyone for the commentary, suggestions and ideas. I'll take a crack at summarizing this and adding it to the use_case_1 document. This is clearly an important issue that deserves extensive documentation, suggestions, and examples.

@dennereed
Copy link
Contributor Author

Hi all. Just came across the TDWG GUID Applicability Statement, which provides very good guidance on this topic.

@debpaul
Copy link
Collaborator

debpaul commented Mar 10, 2017

@dennereed
Copy link
Contributor Author

dennereed commented Mar 10, 2017 via email

@debpaul
Copy link
Collaborator

debpaul commented Mar 10, 2017

Both of those links work for me (but I'm logged in at iDigBio). I wonder if it's because they are technically deprecated. The first 404 you indicate - shows me our page - with a note that says: This Wiki is not current and the material is here for historical purposes. For info on GUIDs go here: https://www.idigbio.org/content/guid-guide-data-providers-0

@dennereed
Copy link
Contributor Author

Created an FAQ wiki page on this topic at https://github.com/tdwg/paleo/wiki/Unique-Identifiers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants