-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run this on Phenocam imagery #57
Comments
See also #10 for BioCLIP - this has been on the back burner for ages, there is a WIP branch #37 and prospect of cleaner approaches the longer we leave it - wondering if you could use that model directly with thingsvision |
The extra complication that occurred to me while working through requests for data access in prep for looking at this as a contribution to the FDRI project next year. Right now our minimal API #53 for returning the responses of a range of models to images in a collection accepts a URL rather than a POSTed encoded image, keeps everything simple. The visualisation app does the same thing. The URLs in this case are objects in a public access object store in JASMIN. I don't know what practical or conceptual barriers there are to doing the same with Phenocam imagery. In theory it could be any http/s endpoint in front of a collection, even on-premises storage. It would be nice to keep avoiding the direct use of s3/boto3 libraries. Perhaps i'm just unreconstructed |
We've got access to the internal storage for this now, as well as the Gitlab repository with processing scripts for the current workflow. Things I'd like to quickly try while we're at a very early stage
|
Rather than start filling this up with non-plankton-specific code, I've created a new experimental repo. Intending this for a simple two-step pipeline that defisheyes the images and uses them for feature extraction. Hoping it will show the value of constant iteration rather than going for a framework too quickly |
This made for a reproducible pipeline (apart from the data storage) and an internal demo recording. Interesting to see the results, this activity is parked for now until the FDRI Phenocam project has a start date, a few learning experiences to apply back here (take the timestamps out of the Luigi pipeline output status files, structure the sqlite database differently so there are two tables, one embeddings+metadata and one |
A test case for this project is to be able to add new image collections and models and re-run the same pipeline and visualisation with minimum changes. Adding the flow cytometer (image) data was a useful demo of this but a completely different domain of images could be more useful and compelling.
The FDRI project has a line of work on Phenocam data collected by COSMOS-UK monitoring stations. There are 5 image samples per day from 50 stations stretching back for some years. It's used to check weather conditions and animal incursions. FDRI core isn't planning much ML, rather looking at automating an existing manual-heavy workflow that's profiling RGB values for a greenness index within a masked area of the image (a few method questions here!)
Anyway it's interesting imagery with unexpected applications (timeseries of embeddings e..g from BioCLIP that should track seasonal change and show anomalies; segmentation-classification e.g. with ClipSEG for a "tell me about all the times a cow nibbled the sensor" view; possibly more fine detail segmentation-classification for plants, though the images probably aren't fine grained enough for that)
Twist here are that
scikit-image
We might be able to either piggyback off or contribute to FDRI work on pipelines (possibly Argo Workflows), or in the shorter term they could use our Luigi setup for rapid prototyping. I'd like to be able to demo that
The text was updated successfully, but these errors were encountered: