Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STAC API integration #111

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open

STAC API integration #111

wants to merge 31 commits into from

Conversation

mo-dkrz
Copy link
Contributor

@mo-dkrz mo-dkrz commented Nov 22, 2024

Hi @antarcticrainforest
This PR implements initial STAC API integration in the Databrowser and replaces aiohttp with httpx across all avaiable CRUDs in core of databrowser
But this is only producing the STAC API from user searches in databroswer, not considering STAC-API service as the search system of databrowser.
I liked to demonstrate the prototype of the STAC integration on the dev machine, but since we needed the bbox field on Solr and on the Solr instance of dev machine, currently we don't have it, so we need to first get the crawler MR done, crawl it, and then we will be able to play around with the pre-release version.

But first and foremost, we need to get the freva-service PR done to be able to keep up with data crawler. Could you please first review that? As soon as you finished the review of freva-service PR, I submit another PR in Freva to have the freva-dev docker image which I need for the data Crawler pipeline.

@antarcticrainforest
Copy link
Member

Can I already look at this?

@mo-dkrz
Copy link
Contributor Author

mo-dkrz commented Dec 5, 2024

Can I already look at this?

@antarcticrainforest yes please. Then we will be able to fix the issues in smaller chunks

Copy link
Member

@antarcticrainforest antarcticrainforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a first look at this. But I still need check and try out how this works in practice. This is only going to happen in February though.

I have a few general comments that I just post here:

  1. container: Instead of using a private container I think stac server service can be integrated into the existing freva-rest server container. Since we already start various services in the container, I think it doesn't hurt to start another one.

  2. Please make sure that any of the infrastructure development/deployment is in line with being able to setup the whole server with conda-forge.

  3. I think the stac catalogue creation should happen in a class on it's own. That way the Solr class becomes less busy and if we add at some point in the future the functionality of adding various backends it'll be easier to maintain and understand.

  4. Also think that other institutions might be potentially using this. You can either use generic sources like freva version bla bla as we do in the intake catalogue creation or use more specific metadata which has to be passed via config (I am not in favour for that option as it makes the already existing configuration more complex).

stac-api:
networks:
- freva-rest
image: ghcr.io/mo-dkrz/stac-fastapi-os:v2.2.9

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this? Did you have to adjust the image? If so, I think you want to make this some kind of official. Like publish it on dockerhub.


.. http:get:: /api/freva-nextgen/databrowser/stac-collection/(str:flavour)/(str:uniq_key)

This endpoint transforms Freva databrowser search results into a dynamic STAC (SpatioTemporal Asset Catalog) Collection. STAC is an open standard for geospatial data cataloguing, enabling consistent discovery and access of climate datasets, satellite imagery and spatiotemporal data. It provides a common language for describing geospatial information and related metadata.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This endpoint transforms Freva databrowser search results into a dynamic STAC (SpatioTemporal Asset Catalog) Collection. STAC is an open standard for geospatial data cataloguing, enabling consistent discovery and access of climate datasets, satellite imagery and spatiotemporal data. It provides a common language for describing geospatial information and related metadata.
This endpoint transforms Freva databrowser search results into a dynamic SpatioTemporal Asset Catalog (STAC) Collection. STAC is an open standard for geospatial data cataloguing, enabling consistent discovery and access of climate datasets, satellite imagery and spatiotemporal data. It provides a common language for describing geospatial information and related metadata.

stream_zarr=False,
**(parse_cli_args(search_keys or [])),
)
print(result.stac_collection())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense to add a "--filename" option and save the catalogue to disk if the --filename option was given. You could do the same for the intake-cli

Comment on lines 129 to 132
username = "stac"

# Password associated with the STAC API server for authentication.
password = "secret"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's a good idea. You shouldn't set default usernames and passwords

Comment on lines 467 to 468
encoded_username = urllib.parse.quote(self.stacapi_user)
encoded_password = urllib.parse.quote(self.stacapi_password)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't want to do that, For example, if I used the password My!name for setting up the server, which should be totally valid, then this quoting would result in My%21name. Which is a totally different password.

Instead you want to make sure that the password doesn't contain characters that aren't interpreted as characters by HTTP: :/?#[]@%

Comment on lines 446 to 451
self.spatial_extent = {
"minx": float("inf"),
"miny": float("inf"),
"maxx": float("-inf"),
"maxy": float("-inf"),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to just put
-180;180 and -90;90

roles=["metadata"],
media_type="application/json",
),
"download-zarr": pystac.Asset(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: can you give this a different name? Like zarr-access or something.

logger.error("PUT request failed: %s", response.text)
response_data = {}
except Exception as error:
logger.error("Connection to %s failed: %s", url, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this an logger.exception then we can see the stack trace.

"""
Create a STAC Item from a result dictionary.

Args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: can you stick to bumpy doc style?

raise HTTPException(status_code=413, detail="Result stream too big.")
collection_id = f"freva-{str(uuid.uuid4())}"
background_tasks.add_task(solr_search.init_stac_collection, request, collection_id)
return {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought. How about creating a redirect response instead of displaying a message ?

@mo-dkrz
Copy link
Contributor Author

mo-dkrz commented Jan 31, 2025

Dear @antarcticrainforest static catalog also has been added to this PR. It means we can get the STAC catalog as a tar.gz file from Freva along with a STAC collection service. Now the the prototype on dev machine only for CORDEX dataset is operational. You can experience both bbox and STAC catalogs there. Also front-end is somehow done. But for redirecting to dynamic STAC API to STAC browser we still have some minor issues which I'm trying to find a solution for it. So it means it redirects to localhost:8085 which localhost has to be replaced with dev machine host address to see the result. I might have one or two more commit here left to fix this.

Comment on lines +622 to +629
help=(
"Special search facet to refine/subset search results by spatial "
"extent. This can be a string representation of a bounding box. "
"The bounding box has to follow the format ``min_lon,min_lat by "
"max_lon,max_lat``. Valid strings are ``-10,10 by -10,10`` to "
"``0,5 by 0,5``. **Note**: You don't have to give the full string "
"format to subset the bounding box ``min_lon,min_lat`` etc are "
"also valid."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to stick to "community" standard. (lon_min,lon_max,lat_min,lat_max)

Comment on lines +86 to +87
.. note:: Longitude values must be between -180 and 180, latitude values
between -90 and 90.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thnk we can deal with that on a code level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  min_lon, max_lon, min_lat, max_lat = bounds

    def normalize_lon(lon):
        # Convert a longitude from 0-360 to -180 to 180 if necessary.
        if lon > 180:
            return lon - 360
        if lon < -180:
            return lon + 360
        return lon

    # Normalize longitudes
    new_min_lon = normalize_lon(min_lon)
    new_max_lon = normalize_lon(max_lon)

@@ -37,7 +37,7 @@ dependencies = [
"watchfiles",
"xarray",
"xpublish",
"zarr",
"zarr==2.18.2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zarr<3?

# IMPORTANT: wait for the background task to start.
# Otherwise the client will get a 404 error and
# has to reload the page to see the STAC collection.
await asyncio.sleep(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to determine if the background task has started?

f"{request.base_url}/stac/{collection_id}"

collection_id = f"Dataset-{(f'{flavour}-{str(uuid.uuid4())}')[:18]}"
if stac_dynamic:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my taste this if block is too big. Could you add the code that is withing this if block into a function? Or at least the run_stac_creation function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants