Run this on Phenocam imagery #57

metazool · 2024-12-04T16:51:17Z

A test case for this project is to be able to add new image collections and models and re-run the same pipeline and visualisation with minimum changes. Adding the flow cytometer (image) data was a useful demo of this but a completely different domain of images could be more useful and compelling.

The FDRI project has a line of work on Phenocam data collected by COSMOS-UK monitoring stations. There are 5 image samples per day from 50 stations stretching back for some years. It's used to check weather conditions and animal incursions. FDRI core isn't planning much ML, rather looking at automating an existing manual-heavy workflow that's profiling RGB values for a greenness index within a masked area of the image (a few method questions here!)

Anyway it's interesting imagery with unexpected applications (timeseries of embeddings e..g from BioCLIP that should track seasonal change and show anomalies; segmentation-classification e.g. with ClipSEG for a "tell me about all the times a cow nibbled the sensor" view; possibly more fine detail segmentation-classification for plants, though the images probably aren't fine grained enough for that)

Twist here are that

they meaningfully come in pairs, you have a north-south hemisphere view, and the storage needs to reflect that linkage
the images have a fisheye effect, you might want to concave them before feature extraction as well as crop them into linked pairs - there's this defisheye package, there might be something in scikit-image

We might be able to either piggyback off or contribute to FDRI work on pipelines (possibly Argo Workflows), or in the shorter term they could use our Luigi setup for rapid prototyping. I'd like to be able to demo that

The text was updated successfully, but these errors were encountered:

metazool · 2024-12-04T16:52:54Z

See also #10 for BioCLIP - this has been on the back burner for ages, there is a WIP branch #37 and prospect of cleaner approaches the longer we leave it - wondering if you could use that model directly with thingsvision

metazool · 2024-12-11T16:28:16Z

The extra complication that occurred to me while working through requests for data access in prep for looking at this as a contribution to the FDRI project next year. Right now our minimal API #53 for returning the responses of a range of models to images in a collection accepts a URL rather than a POSTed encoded image, keeps everything simple. The visualisation app does the same thing.

The URLs in this case are objects in a public access object store in JASMIN. I don't know what practical or conceptual barriers there are to doing the same with Phenocam imagery. In theory it could be any http/s endpoint in front of a collection, even on-premises storage. It would be nice to keep avoiding the direct use of s3/boto3 libraries. Perhaps i'm just unreconstructed

metazool · 2024-12-19T10:24:29Z

We've got access to the internal storage for this now, as well as the Gitlab repository with processing scripts for the current workflow. Things I'd like to quickly try while we're at a very early stage

defisheye for projecting the perspective
feature extraction with thingsvision
semantic segmentation to create labelled masks, possibly with DINOv2 (I've used CLIPSeg for this in the past)
other?

metazool · 2024-12-24T12:15:44Z

Rather than start filling this up with non-plankton-specific code, I've created a new experimental repo.

Intending this for a simple two-step pipeline that defisheyes the images and uses them for feature extraction. Hoping it will show the value of constant iteration rather than going for a framework too quickly

metazool · 2025-01-15T09:54:37Z

This made for a reproducible pipeline (apart from the data storage) and an internal demo recording. Interesting to see the results, this activity is parked for now until the FDRI Phenocam project has a start date, a few learning experiences to apply back here (take the timestamps out of the Luigi pipeline output status files, structure the sqlite database differently so there are two tables, one embeddings+metadata and one vec0 type that's just the embedding index)

metazool added this to Plankton data pipelines Dec 11, 2024

metazool moved this to Todo in Plankton data pipelines Dec 11, 2024

metazool mentioned this issue Dec 16, 2024

FDRI Phenocam pipelines ukceh-rse/ukceh-rse.github.io#1

Open

metazool moved this from Todo to In Progress in Plankton data pipelines Dec 24, 2024

metazool moved this from In Progress to Done in Plankton data pipelines Jan 15, 2025

metazool closed this as completed by moving to Done in Plankton data pipelines Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run this on Phenocam imagery #57

Run this on Phenocam imagery #57

metazool commented Dec 4, 2024 •

edited

Loading

metazool commented Dec 4, 2024

metazool commented Dec 11, 2024

metazool commented Dec 19, 2024

metazool commented Dec 24, 2024

metazool commented Jan 15, 2025

Run this on Phenocam imagery #57

Run this on Phenocam imagery #57

Comments

metazool commented Dec 4, 2024 • edited Loading

metazool commented Dec 4, 2024

metazool commented Dec 11, 2024

metazool commented Dec 19, 2024

metazool commented Dec 24, 2024

metazool commented Jan 15, 2025

metazool commented Dec 4, 2024 •

edited

Loading