Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how best to regularly clean up folder images from N drive #555

Open
beckerah opened this issue Dec 18, 2024 · 3 comments
Open
Assignees
Labels
AU For tasks relating specifically to AU data Tasks or issues related to dassco-data workflows

Comments

@beckerah
Copy link
Contributor

Things to consider:

How do I figure out which images are folders?
I can run the barcode-guid matching script. If this isn't producing good enough results, I can run the species-ocr barcode reader on the N drive images as well. This produces an output.json file, which I already have a script to parse.
I can grab the original output.json file from the AU computer and pull out the image names and barcodes. This might be useful for comparing against barcode-guid matching db, to check that all barcodes were correctly interpreted and added to the db. Though it might be faster to just run the images on the N drive through the species-ocr barcode reader.

I think it makes sense to move all folder images to a specified directory first, then I can just do a quick glance to make sure they are actually folders before deleting them.

@beckerah beckerah self-assigned this Dec 18, 2024
@beckerah beckerah added AU For tasks relating specifically to AU data Tasks or issues related to dassco-data workflows labels Dec 18, 2024
@beckerah
Copy link
Contributor Author

Draft skeleton workflow:

  1. Run barcode-guid matching script
  2. Compare barcode-guid db with species-web export for relevant dates
  3. If discrepancy is over determined threshold (30?), run species-ocr barcode reader on images and update barcode-guid db with additional found barcodes
  4. Move all images with no found barcode into a temp folder (tifs and jsons in one temp folder, jpegs in another). Create log file for each time images are moved. (Temp folders are currently on N drive at N:\SCI-SNM-DigitalCollections\DaSSCo\MASTER_IMAGE_STORE\temp_au_folder_images)
  5. Check to make sure no specimen images. (If specimen image found, check if barcode is in species-web export or on incorrectly ID'd spreadsheet. Move back to original folder - all 3 associated files.)
  6. Delete all files (tifs, jsons, jpegs)

@beckerah
Copy link
Contributor Author

A thought: Would it be possible to have species-ocr move all folder images to a separate folder after uploading to web? If that were the case, then we could run the ingestion client on the original session folder, meaning that only specimen images would be left there to be ingested.

@beckerah
Copy link
Contributor Author

This relates to #567

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AU For tasks relating specifically to AU data Tasks or issues related to dassco-data workflows
Projects
None yet
Development

No branches or pull requests

1 participant