Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add profile for STAC catalogs #55

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions federation/backends/stac_catalogs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# STAC Catalog profile

This document lists the requirements for a data provider that wants to make a dataset available as a
new collection in the openEO platform. More specifically, it defines the requirements for STAC catalogs. The STAC specification
is also used within openEO, and is a common choice in the EO community.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earth Observation (EO)


The requirements and guidelines in this document serve mostly as clarifications to STAC itself. It does not aim to contradict or overrule
the STAC specification, and if this is perceived to be the case, the STAC specification should get precedence. It may however further constrain
STAC in cases where it allows too much freedom.

## Scope

This profile aims to add a minimal set of requirements to STAC that is needed for a catalog to be compatible with openEO backends.
Requirements that are optional, for instance because they enhance efficiency, are indicated as such.

This profile is limited in scope to the most common types of raster EO datasets in the sense of how they are organized. It does not aim to define guidelines
for very complex cases, or cover evey possible type of (EO) dataset.

### Dataset organization

1. STAC collections will be mapped to openEO collections
2. All STAC products in the same collection have the same set of 'bands', that are defined at collection level. This allows backends to look up the selected bands in a product.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this means eo:bands in summaries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more in terms of raster:bands, even though I didn't explicitly require that one yet. It's more about what we in openEO have in the standard 'bands' dimension.

Copy link
Collaborator

@m-mohr m-mohr Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, makes sense.

We are currently discussing merging raster:bands and eo:bands anyway, so this might get a bit less messy in the future.

3. Assets in a product are raster files that contain a single band.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not clear to me. What is meant by product? A single item? If that is stored as GTIFF it can contain also several bands.



### Raster format

1. Cloud native, georeferenced, raster formats shall be used. Common examples include (cloud optimized) geotiff and jpeg2000.
2. Rasters data is provided as regularly gridded data
3. Georeferencing shall be based on standard projection systems.
4. Overviews or pyramids are optional, they enhance viewing performance but are not required for processing at full resolution.

### Data access

1. STAC assets should contain a direct link to the raster file itself.
2. Assets links use the http(s) or S3 scheme.
3. [HTTP RANGE](https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests) requests shall be supported, to request parts of an asset.

### Authentication

Integration of federated data access can be made more complex if the data provider imposes complex authentication schemes.
Currently we aim for catalogs that allow the openEO backend to log in with a single account for all data access. A more complex
setup would be to make the login depend on the user making the openEO request.

1. The STAC API itself should preferably be publicly accessible, avoiding the need for login.
2. Asset links may require authentication.
3. One of the following authentication schemes should be supported:
1. Basic authentication
2. S3 authentication headers

## STAC metadata

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another useful extension would be the raster one, specifically when working with data that needs to be have rescale and offset applied https://github.com/stac-extensions/raster#raster-band-object.

In particular, this is essential for Senitnel-2 L2A to BOA conversion, as specified here: https://sentinels.copernicus.eu/web/sentinel/technical-guides/sentinel-2-msi/level-2a-algorithms-products

### Data cube extension

For collection-level metadata, the datacube extension needs to be used to define the dimensions:
Please see <https://github.com/stac-extensions/datacube>


### Projection information
openEO backends require knowledge about the projection system, which should be provided at the collection level if it is the same for all products.
Collections with multiple projections, for instance based on UTM, can specify it at product level.
The [proj:epsg](https://github.com/stac-extensions/projection#item-properties-or-asset-fields) property is used to specify the projection.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From other implementation I know that they need two of proj:transform, proj:shape, proj:bbox - This is not required in Platform, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would actually be very helpful, but not a lot of catalogs have this info in the metadata.


### Product geometry

The product geometry specifies the footprint of the data in the raster file. It may be used by the backend for spatial filtering,
so any pixel outside of the footprint may not be loaded by the backend. The geometry is provided in EPSG:4326, but may be interpreted in the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is invalid in STAC and contradicts with the "if this is perceived to be the case, the STAC specification should get precedence". The geometry must always be EPSG:4326, but you could provide a geometry and bbox in the native projection via proj:geometry and proj:bbox.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the geometry does specify the footprint in EPSG:4326 right? Or is the part about the interpretation wrong?
Maybe I should add that proj:geometry can be used as way to provide a less ambiguous geometry?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the geometry must be EPSG:4326. proj:geometry can be used to specify a geometry in the native CRS (bbox works the same way). What I'm confused about is "but may be interpreted in the product projection system." - What does this mean/imply? I guess this should be dropped and replaced with a mention or the proj: fields?

product projection system.

### Common properties
openEO allows collections to be filtered by property, just like STAC catalogs. This means that most property filters are forwarded 'as-is' to the STAC catalog.
It helps openEO users a lot if collections use the same property wherever possible. The STAC extensions define such common properties.

These are some often used examples:
- [eo:cloud_cover](https://github.com/stac-extensions/eo#eocloud_cover) for cloud cover percentage
- [sat:orbit_state](https://github.com/stac-extensions/sat#satorbit_state) for orbit direction (ascending/descending)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to add/link to the things we already collected here: https://github.com/Open-EO/openeo-stac-extensions