Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protocol for data quality assurance checks #507

Open
PipBrewer opened this issue May 30, 2024 · 1 comment
Open

Protocol for data quality assurance checks #507

PipBrewer opened this issue May 30, 2024 · 1 comment
Assignees
Labels
1 priority 1

Comments

@PipBrewer
Copy link
Collaborator

PipBrewer commented May 30, 2024

We need to develop and write a protocol for quality assurance checks on the data in Specify. This involves selecting a minimum of 5 Specify records per collection for which DaSSCo is generating data via mass digitisation every month and checking the Specify records against the digiapp imports and the images. We should use an automated number generator to select the specific records we should check. We should also keep a record of how many records from each collection we have checked when and the number and type of issues we find. Please update the following document with the protocol: Protocol for data quality assurance checks in Specify.docx which is located: N:\SCI-SNM-DigitalCollections\DaSSCo\Admin and project management\Data tasks

@PipBrewer PipBrewer added the 1 priority 1 label May 30, 2024
@RebekkaML
Copy link

Since Pip asked for any thoughts on this topic:

We also need to make sure the images exist and can be found, I suspect that a lot are still stuck in all kinds of error folders.

Some barcodes might not even be in Specify. There is no system to check if we actually scanned each barcode at the herbarium (at the pinned insects station we barcode and image specimens in one go, here we can check that we have the same number of barcodes and images in the end). But at the Herbarium, I suspect that there are some barcodes that weren’t scanned. I found one such case when working on the author sheet, and I also noticed during importing that there are sometimes gaps of 1 barcode in the datasets.

Other issues we should look out for are MSO / MOS and if they are connected and barcoded correctly, and that the taxonomic information is correct, since this had a lot of issues during importing (especially correctly recognizing Hybrids, subspecies, variety etc.) and I’m sure that we missed some of these cases.

@beckerah beckerah self-assigned this Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 priority 1
Projects
None yet
Development

No branches or pull requests

3 participants