-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STAC API integration #111
base: main
Are you sure you want to change the base?
STAC API integration #111
Conversation
Can I already look at this? |
@antarcticrainforest yes please. Then we will be able to fix the issues in smaller chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had a first look at this. But I still need check and try out how this works in practice. This is only going to happen in February though.
I have a few general comments that I just post here:
-
container: Instead of using a private container I think stac server service can be integrated into the existing freva-rest server container. Since we already start various services in the container, I think it doesn't hurt to start another one.
-
Please make sure that any of the infrastructure development/deployment is in line with being able to setup the whole server with conda-forge.
-
I think the stac catalogue creation should happen in a class on it's own. That way the Solr class becomes less busy and if we add at some point in the future the functionality of adding various backends it'll be easier to maintain and understand.
-
Also think that other institutions might be potentially using this. You can either use generic sources like freva version bla bla as we do in the intake catalogue creation or use more specific metadata which has to be passed via config (I am not in favour for that option as it makes the already existing configuration more complex).
dev-env/docker-compose.yaml
Outdated
stac-api: | ||
networks: | ||
- freva-rest | ||
image: ghcr.io/mo-dkrz/stac-fastapi-os:v2.2.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this? Did you have to adjust the image? If so, I think you want to make this some kind of official. Like publish it on dockerhub.
docs/source/databrowser/APIRef.rst
Outdated
|
||
.. http:get:: /api/freva-nextgen/databrowser/stac-collection/(str:flavour)/(str:uniq_key) | ||
|
||
This endpoint transforms Freva databrowser search results into a dynamic STAC (SpatioTemporal Asset Catalog) Collection. STAC is an open standard for geospatial data cataloguing, enabling consistent discovery and access of climate datasets, satellite imagery and spatiotemporal data. It provides a common language for describing geospatial information and related metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This endpoint transforms Freva databrowser search results into a dynamic STAC (SpatioTemporal Asset Catalog) Collection. STAC is an open standard for geospatial data cataloguing, enabling consistent discovery and access of climate datasets, satellite imagery and spatiotemporal data. It provides a common language for describing geospatial information and related metadata. | |
This endpoint transforms Freva databrowser search results into a dynamic SpatioTemporal Asset Catalog (STAC) Collection. STAC is an open standard for geospatial data cataloguing, enabling consistent discovery and access of climate datasets, satellite imagery and spatiotemporal data. It provides a common language for describing geospatial information and related metadata. |
stream_zarr=False, | ||
**(parse_cli_args(search_keys or [])), | ||
) | ||
print(result.stac_collection()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make more sense to add a "--filename" option and save the catalogue to disk if the --filename option was given. You could do the same for the intake-cli
username = "stac" | ||
|
||
# Password associated with the STAC API server for authentication. | ||
password = "secret" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's a good idea. You shouldn't set default usernames and passwords
freva-rest/src/freva_rest/config.py
Outdated
encoded_username = urllib.parse.quote(self.stacapi_user) | ||
encoded_password = urllib.parse.quote(self.stacapi_password) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't want to do that, For example, if I used the password My!name
for setting up the server, which should be totally valid, then this quoting would result in My%21name
. Which is a totally different password.
Instead you want to make sure that the password doesn't contain characters that aren't interpreted as characters by HTTP: :/?#[]@%
self.spatial_extent = { | ||
"minx": float("inf"), | ||
"miny": float("inf"), | ||
"maxx": float("-inf"), | ||
"maxy": float("-inf"), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to just put
-180;180 and -90;90
roles=["metadata"], | ||
media_type="application/json", | ||
), | ||
"download-zarr": pystac.Asset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: can you give this a different name? Like zarr-access
or something.
logger.error("PUT request failed: %s", response.text) | ||
response_data = {} | ||
except Exception as error: | ||
logger.error("Connection to %s failed: %s", url, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make this an logger.exception
then we can see the stack trace.
""" | ||
Create a STAC Item from a result dictionary. | ||
|
||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: can you stick to bumpy doc style?
raise HTTPException(status_code=413, detail="Result stream too big.") | ||
collection_id = f"freva-{str(uuid.uuid4())}" | ||
background_tasks.add_task(solr_search.init_stac_collection, request, collection_id) | ||
return { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought. How about creating a redirect response instead of displaying a message ?
…llection and sub catalog demands
Dear @antarcticrainforest static catalog also has been added to this PR. It means we can get the STAC catalog as a |
…ame for the sake of findability
…ivering a non-404 content before backround task
help=( | ||
"Special search facet to refine/subset search results by spatial " | ||
"extent. This can be a string representation of a bounding box. " | ||
"The bounding box has to follow the format ``min_lon,min_lat by " | ||
"max_lon,max_lat``. Valid strings are ``-10,10 by -10,10`` to " | ||
"``0,5 by 0,5``. **Note**: You don't have to give the full string " | ||
"format to subset the bounding box ``min_lon,min_lat`` etc are " | ||
"also valid." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to stick to "community" standard. (lon_min,lon_max,lat_min,lat_max)
.. note:: Longitude values must be between -180 and 180, latitude values | ||
between -90 and 90. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thnk we can deal with that on a code level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
min_lon, max_lon, min_lat, max_lat = bounds
def normalize_lon(lon):
# Convert a longitude from 0-360 to -180 to 180 if necessary.
if lon > 180:
return lon - 360
if lon < -180:
return lon + 360
return lon
# Normalize longitudes
new_min_lon = normalize_lon(min_lon)
new_max_lon = normalize_lon(max_lon)
@@ -37,7 +37,7 @@ dependencies = [ | |||
"watchfiles", | |||
"xarray", | |||
"xpublish", | |||
"zarr", | |||
"zarr==2.18.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zarr<3?
# IMPORTANT: wait for the background task to start. | ||
# Otherwise the client will get a 404 error and | ||
# has to reload the page to see the STAC collection. | ||
await asyncio.sleep(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to determine if the background task has started?
f"{request.base_url}/stac/{collection_id}" | ||
|
||
collection_id = f"Dataset-{(f'{flavour}-{str(uuid.uuid4())}')[:18]}" | ||
if stac_dynamic: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my taste this if block is too big. Could you add the code that is withing this if block into a function? Or at least the run_stac_creation
function.
Hi @antarcticrainforest
This PR implements initial STAC API integration in the Databrowser and replaces aiohttp with httpx across all avaiable CRUDs in core of databrowser
But this is only producing the STAC API from user searches in databroswer, not considering STAC-API service as the search system of databrowser.
I liked to demonstrate the prototype of the STAC integration on the dev machine, but since we needed the
bbox
field on Solr and on the Solr instance of dev machine, currently we don't have it, so we need to first get the crawler MR done, crawl it, and then we will be able to play around with the pre-release version.But first and foremost, we need to get the freva-service PR done to be able to keep up with data crawler. Could you please first review that? As soon as you finished the review of freva-service PR, I submit another PR in Freva to have the
freva-dev
docker image which I need for the data Crawler pipeline.