STAC API integration #111

mo-dkrz · 2024-11-22T14:46:14Z

Hi @antarcticrainforest
This PR implements initial STAC API integration in the Databrowser and replaces aiohttp with httpx across all avaiable CRUDs in core of databrowser
But this is only producing the STAC API from user searches in databroswer, not considering STAC-API service as the search system of databrowser.
I liked to demonstrate the prototype of the STAC integration on the dev machine, but since we needed the bbox field on Solr and on the Solr instance of dev machine, currently we don't have it, so we need to first get the crawler MR done, crawl it, and then we will be able to play around with the pre-release version.

But first and foremost, we need to get the freva-service PR done to be able to keep up with data crawler. Could you please first review that? As soon as you finished the review of freva-service PR, I submit another PR in Freva to have the freva-dev docker image which I need for the data Crawler pipeline.

antarcticrainforest · 2024-12-05T06:19:30Z

Can I already look at this?

mo-dkrz · 2024-12-05T10:49:57Z

Can I already look at this?

@antarcticrainforest yes please. Then we will be able to fix the issues in smaller chunks

antarcticrainforest

I've had a first look at this. But I still need check and try out how this works in practice. This is only going to happen in February though.

I have a few general comments that I just post here:

container: Instead of using a private container I think stac server service can be integrated into the existing freva-rest server container. Since we already start various services in the container, I think it doesn't hurt to start another one.
Please make sure that any of the infrastructure development/deployment is in line with being able to setup the whole server with conda-forge.
I think the stac catalogue creation should happen in a class on it's own. That way the Solr class becomes less busy and if we add at some point in the future the functionality of adding various backends it'll be easier to maintain and understand.
Also think that other institutions might be potentially using this. You can either use generic sources like freva version bla bla as we do in the intake catalogue creation or use more specific metadata which has to be passed via config (I am not in favour for that option as it makes the already existing configuration more complex).

antarcticrainforest · 2024-12-30T06:23:26Z

dev-env/docker-compose.yaml

+  stac-api:
+    networks:
+      - freva-rest
+    image: ghcr.io/mo-dkrz/stac-fastapi-os:v2.2.9


What's this? Did you have to adjust the image? If so, I think you want to make this some kind of official. Like publish it on dockerhub.

antarcticrainforest · 2024-12-30T06:25:04Z

docs/source/databrowser/APIRef.rst

+
+.. http:get:: /api/freva-nextgen/databrowser/stac-collection/(str:flavour)/(str:uniq_key)
+
+    This endpoint transforms Freva databrowser search results into a dynamic STAC (SpatioTemporal Asset Catalog) Collection. STAC is an open standard for geospatial data cataloguing, enabling consistent discovery and access of climate datasets, satellite imagery and spatiotemporal data. It provides a common language for describing geospatial information and related metadata.


Suggested change

This endpoint transforms Freva databrowser search results into a dynamic STAC (SpatioTemporal Asset Catalog) Collection. STAC is an open standard for geospatial data cataloguing, enabling consistent discovery and access of climate datasets, satellite imagery and spatiotemporal data. It provides a common language for describing geospatial information and related metadata.

This endpoint transforms Freva databrowser search results into a dynamic SpatioTemporal Asset Catalog (STAC) Collection. STAC is an open standard for geospatial data cataloguing, enabling consistent discovery and access of climate datasets, satellite imagery and spatiotemporal data. It provides a common language for describing geospatial information and related metadata.

antarcticrainforest · 2024-12-30T06:30:43Z

freva-client/src/freva_client/cli/databrowser_cli.py

+        stream_zarr=False,
+        **(parse_cli_args(search_keys or [])),
+    )
+    print(result.stac_collection())


Wouldn't it make more sense to add a "--filename" option and save the catalogue to disk if the --filename option was given. You could do the same for the intake-cli

antarcticrainforest · 2024-12-30T06:36:14Z

freva-rest/src/freva_rest/api_config.toml

+username = "stac"
+
+# Password associated with the STAC API server for authentication.
+password = "secret"


I don't think that's a good idea. You shouldn't set default usernames and passwords

antarcticrainforest · 2024-12-30T06:52:02Z

freva-rest/src/freva_rest/config.py

+        encoded_username = urllib.parse.quote(self.stacapi_user)
+        encoded_password = urllib.parse.quote(self.stacapi_password)


I think you don't want to do that, For example, if I used the password My!name for setting up the server, which should be totally valid, then this quoting would result in My%21name. Which is a totally different password.

Instead you want to make sure that the password doesn't contain characters that aren't interpreted as characters by HTTP: :/?#[]@%

antarcticrainforest · 2025-01-01T13:57:08Z

freva-rest/src/freva_rest/databrowser_api/core.py

+        self.spatial_extent = {
+            "minx": float("inf"),
+            "miny": float("inf"),
+            "maxx": float("-inf"),
+            "maxy": float("-inf"),
+        }


Wouldn't it be better to just put
-180;180 and -90;90

antarcticrainforest · 2025-01-01T14:05:19Z

freva-rest/src/freva_rest/databrowser_api/core.py

+                roles=["metadata"],
+                media_type="application/json",
+            ),
+            "download-zarr": pystac.Asset(


Nitpick: can you give this a different name? Like zarr-access or something.

antarcticrainforest · 2025-01-02T18:13:36Z

freva-rest/src/freva_rest/databrowser_api/core.py

+                    logger.error("PUT request failed: %s", response.text)
+                    response_data = {}
+            except Exception as error:
+                logger.error("Connection to %s failed: %s", url, error)


Could you make this an logger.exception then we can see the stack trace.

antarcticrainforest · 2025-01-02T18:20:36Z

freva-rest/src/freva_rest/databrowser_api/core.py

+        """
+        Create a STAC Item from a result dictionary.
+
+        Args:


Nitpick: can you stick to bumpy doc style?

antarcticrainforest · 2025-01-02T18:46:55Z

freva-rest/src/freva_rest/databrowser_api/endpoints.py

+        raise HTTPException(status_code=413, detail="Result stream too big.")
+    collection_id = f"freva-{str(uuid.uuid4())}"
+    background_tasks.add_task(solr_search.init_stac_collection, request, collection_id)
+    return {


Just a thought. How about creating a redirect response instead of displaying a message ?

…llection and sub catalog demands

mo-dkrz · 2025-01-31T20:34:32Z

Dear @antarcticrainforest static catalog also has been added to this PR. It means we can get the STAC catalog as a tar.gz file from Freva along with a STAC collection service. Now the the prototype on dev machine only for CORDEX dataset is operational. You can experience both bbox and STAC catalogs there. Also front-end is somehow done. But for redirecting to dynamic STAC API to STAC browser we still have some minor issues which I'm trying to find a solution for it. So it means it redirects to localhost:8085 which localhost has to be replaced with dev machine host address to see the result. I might have one or two more commit here left to fix this.

…ame for the sake of findability

…ivering a non-404 content before backround task

antarcticrainforest · 2025-02-06T07:37:14Z

freva-client/src/freva_client/cli/databrowser_cli.py

+        help=(
+            "Special search facet to refine/subset search results by spatial "
+            "extent. This can be a string representation of a bounding box. "
+            "The bounding box has to follow the format ``min_lon,min_lat by "
+            "max_lon,max_lat``. Valid strings are ``-10,10 by -10,10`` to "
+            "``0,5 by 0,5``. **Note**: You don't have to give the full string "
+            "format to subset the bounding box ``min_lon,min_lat`` etc are "
+            "also valid."


I think you want to stick to "community" standard. (lon_min,lon_max,lat_min,lat_max)

antarcticrainforest · 2025-02-06T07:38:28Z

freva-client/src/freva_client/query.py

+        .. note:: Longitude values must be between -180 and 180, latitude values
+                    between -90 and 90.


I thnk we can deal with that on a code level.

min_lon, max_lon, min_lat, max_lat = bounds def normalize_lon(lon): # Convert a longitude from 0-360 to -180 to 180 if necessary. if lon > 180: return lon - 360 if lon < -180: return lon + 360 return lon # Normalize longitudes new_min_lon = normalize_lon(min_lon) new_max_lon = normalize_lon(max_lon)

antarcticrainforest · 2025-02-07T06:04:01Z

freva-data-portal-worker/pyproject.toml

@@ -37,7 +37,7 @@ dependencies = [
 "watchfiles",
 "xarray",
 "xpublish",
-"zarr",
+"zarr==2.18.2",


antarcticrainforest · 2025-02-07T06:06:28Z

freva-rest/src/freva_rest/databrowser_api/endpoints.py

+        # IMPORTANT: wait for the background task to start.
+        # Otherwise the client will get a 404 error and
+        # has to reload the page to see the STAC collection.
+        await asyncio.sleep(1)


Is there a better way to determine if the background task has started?

antarcticrainforest · 2025-02-07T06:07:23Z

freva-rest/src/freva_rest/databrowser_api/endpoints.py

-            f"{request.base_url}/stac/{collection_id}"
+
+    collection_id = f"Dataset-{(f'{flavour}-{str(uuid.uuid4())}')[:18]}"
+    if stac_dynamic:


For my taste this if block is too big. Could you add the code that is withing this if block into a function? Or at least the run_stac_creation function.

mo-dkrz added 6 commits November 21, 2024 23:06

initial commit

c22801f

changed stacbrowser port to 8085

da738e5

bump pre-release version

fa68a4a

fixed ci

0697ee8

refactored the config to call the stacapi inside container

cd9574e

bumped the second pre-release version

aa6f680

mo-dkrz requested a review from antarcticrainforest November 22, 2024 14:46

antarcticrainforest requested changes Jan 2, 2025

View reviewed changes

mo-dkrz added 13 commits January 25, 2025 20:57

resolved main branch conflict

f46aa27

checkout to main freva service submodule

acf0597

added static stac catalogue functionality to support user with sub co…

cadb61f

…llection and sub catalog demands

downgraded the zarr version to be able to support xpublish and py39

6ee7921

determined the forgotten stac_dynamic out of search param query

6683489

resolved some minor issues related to the reviewer request changes

3bdb65e

reflect the newest changes in the test

da2e399

corrected the return in stac-catalogue query

a0591a7

removed a leftover line from debuging

95baaf0

added bbox adjustment to make the databroswer query bbox on Solr

42b9a9f

let's figure out why the number of result in local and ci is different

101b8e3

don't validate stacapi avaiability for static catalogs

ac6d9a3

bump pre-release

bf06469

mo-dkrz requested a review from antarcticrainforest January 31, 2025 20:25

mo-dkrz added 5 commits February 3, 2025 14:08

fixed the issue with the redirection proxy

c438488

added more guide on assets and made the collection id and title the s…

51e2c9d

…ame for the sake of findability

bumped pre-release for tomorrow's meeting

eff9790

added one second delay to make the first collection page for user

02d6a15

fixed linter issues

b73b41f

mo-dkrz added 7 commits February 5, 2025 18:15

bumped new pre-released version

053a11c

changed the logic of stac collection creation to get air-time for del…

29ce580

…ivering a non-404 content before backround task

bumped pre-release 2502.0.0-dev3

ecbf52f

fixed minor issues in assets links

72468c8

fixed minor issues in assets

5f09aa0

fixed the issue with unstable facet.fields

2e0652e

fixed linter issue

87cee1b

antarcticrainforest requested changes Feb 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STAC API integration #111

STAC API integration #111

mo-dkrz commented Nov 22, 2024 •

edited

Loading

antarcticrainforest commented Dec 5, 2024

mo-dkrz commented Dec 5, 2024

antarcticrainforest left a comment

antarcticrainforest Dec 30, 2024

antarcticrainforest Dec 30, 2024

antarcticrainforest Dec 30, 2024

antarcticrainforest Dec 30, 2024

antarcticrainforest Dec 30, 2024

antarcticrainforest Jan 1, 2025

antarcticrainforest Jan 1, 2025

antarcticrainforest Jan 2, 2025

antarcticrainforest Jan 2, 2025

antarcticrainforest Jan 2, 2025

mo-dkrz commented Jan 31, 2025 •

edited

Loading

antarcticrainforest Feb 6, 2025

antarcticrainforest Feb 6, 2025

antarcticrainforest Feb 7, 2025

antarcticrainforest Feb 7, 2025

antarcticrainforest Feb 7, 2025

antarcticrainforest Feb 7, 2025


		.. http:get:: /api/freva-nextgen/databrowser/stac-collection/(str:flavour)/(str:uniq_key)

		This endpoint transforms Freva databrowser search results into a dynamic STAC (SpatioTemporal Asset Catalog) Collection. STAC is an open standard for geospatial data cataloguing, enabling consistent discovery and access of climate datasets, satellite imagery and spatiotemporal data. It provides a common language for describing geospatial information and related metadata.

		encoded_username = urllib.parse.quote(self.stacapi_user)
		encoded_password = urllib.parse.quote(self.stacapi_password)

		.. note:: Longitude values must be between -180 and 180, latitude values
		between -90 and 90.

STAC API integration #111

Are you sure you want to change the base?

STAC API integration #111

Conversation

mo-dkrz commented Nov 22, 2024 • edited Loading

antarcticrainforest commented Dec 5, 2024

mo-dkrz commented Dec 5, 2024

antarcticrainforest left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mo-dkrz commented Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mo-dkrz commented Nov 22, 2024 •

edited

Loading

mo-dkrz commented Jan 31, 2025 •

edited

Loading