Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run this on Phenocam imagery #57

Closed
metazool opened this issue Dec 4, 2024 · 5 comments
Closed

Run this on Phenocam imagery #57

metazool opened this issue Dec 4, 2024 · 5 comments

Comments

@metazool
Copy link
Collaborator

metazool commented Dec 4, 2024

A test case for this project is to be able to add new image collections and models and re-run the same pipeline and visualisation with minimum changes. Adding the flow cytometer (image) data was a useful demo of this but a completely different domain of images could be more useful and compelling.

The FDRI project has a line of work on Phenocam data collected by COSMOS-UK monitoring stations. There are 5 image samples per day from 50 stations stretching back for some years. It's used to check weather conditions and animal incursions. FDRI core isn't planning much ML, rather looking at automating an existing manual-heavy workflow that's profiling RGB values for a greenness index within a masked area of the image (a few method questions here!)

Anyway it's interesting imagery with unexpected applications (timeseries of embeddings e..g from BioCLIP that should track seasonal change and show anomalies; segmentation-classification e.g. with ClipSEG for a "tell me about all the times a cow nibbled the sensor" view; possibly more fine detail segmentation-classification for plants, though the images probably aren't fine grained enough for that)

Twist here are that

  • they meaningfully come in pairs, you have a north-south hemisphere view, and the storage needs to reflect that linkage
  • the images have a fisheye effect, you might want to concave them before feature extraction as well as crop them into linked pairs - there's this defisheye package, there might be something in scikit-image

We might be able to either piggyback off or contribute to FDRI work on pipelines (possibly Argo Workflows), or in the shorter term they could use our Luigi setup for rapid prototyping. I'd like to be able to demo that

@metazool
Copy link
Collaborator Author

metazool commented Dec 4, 2024

See also #10 for BioCLIP - this has been on the back burner for ages, there is a WIP branch #37 and prospect of cleaner approaches the longer we leave it - wondering if you could use that model directly with thingsvision

@metazool
Copy link
Collaborator Author

The extra complication that occurred to me while working through requests for data access in prep for looking at this as a contribution to the FDRI project next year. Right now our minimal API #53 for returning the responses of a range of models to images in a collection accepts a URL rather than a POSTed encoded image, keeps everything simple. The visualisation app does the same thing.

The URLs in this case are objects in a public access object store in JASMIN. I don't know what practical or conceptual barriers there are to doing the same with Phenocam imagery. In theory it could be any http/s endpoint in front of a collection, even on-premises storage. It would be nice to keep avoiding the direct use of s3/boto3 libraries. Perhaps i'm just unreconstructed

@metazool
Copy link
Collaborator Author

We've got access to the internal storage for this now, as well as the Gitlab repository with processing scripts for the current workflow. Things I'd like to quickly try while we're at a very early stage

  • defisheye for projecting the perspective
  • feature extraction with thingsvision
  • semantic segmentation to create labelled masks, possibly with DINOv2 (I've used CLIPSeg for this in the past)
  • other?

@metazool metazool moved this from Todo to In Progress in Plankton data pipelines Dec 24, 2024
@metazool
Copy link
Collaborator Author

Rather than start filling this up with non-plankton-specific code, I've created a new experimental repo.

Intending this for a simple two-step pipeline that defisheyes the images and uses them for feature extraction. Hoping it will show the value of constant iteration rather than going for a framework too quickly

@metazool metazool moved this from In Progress to Done in Plankton data pipelines Jan 15, 2025
@metazool metazool closed this as completed by moving to Done in Plankton data pipelines Jan 15, 2025
@metazool
Copy link
Collaborator Author

This made for a reproducible pipeline (apart from the data storage) and an internal demo recording. Interesting to see the results, this activity is parked for now until the FDRI Phenocam project has a start date, a few learning experiences to apply back here (take the timestamps out of the Luigi pipeline output status files, structure the sqlite database differently so there are two tables, one embeddings+metadata and one vec0 type that's just the embedding index)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

1 participant