Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add documentation for the readme #8

Merged
merged 1 commit into from
Feb 26, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 31 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,38 @@
# stac-rucio
A python package for managing STAC asset data locality when those assets are managed by a RUCIO server.
A python package for managing STAC asset locality amongst a list of Rucio Storage Elements.

The package is intended to work by consuming the results of queries made against a stac catalogue. For a given stac search,
a list of items are returned, for this list of items, the stac_rucio package supports creating replicas,
and creating replication rules.

The stac_rucio package additionally comes with a modifier function to be used with the pystac_client,
the modifier will check rucio for any existing replicas across the available RSEs, available replicas will
be listed within an asset using the alternate assets stac extension.

## Installation

PyPI support coming.

## Usage

## Development
## Example

A notebook example is available [here](./examples/demo_example.ipynb).

In the image below, we'll consinder a stac item which has no replicas or rules in rucio, but our
user wants to bring the data into RSE #2. For this to work, we require a Non-Deterministic
RSE ( ND-RSE ) to be configured amongst the available RSEs. A ND-RSE is a storage element in Rucio with no
physical storage attached. This storage element is a pragmatic solution for "registering" data in Rucio where
that data remains located at external http endpoints. Data will only be replicated in this example, when
the user creates a replication rule to get data from RSE #1 to RSE #2 ( but this data never actually resides at
RSE #1 ).

![Stac Rucio](stac_rucio.png "Stac Rucio")

To register the data for B01, our user needs to create a replica in the ND-RSE for this item if it does not
already exist. For STAC collections supported by a Rucio server in this way, it would be best to create these replicas
ahead of time, as they will always be required, and will always be registered at the same ND-RSE. When replicas are created,
the file size needs to be provided in bytes, as well as the adler32 checksum.

When the replica exists, the user needs to create a replication rule. The replication rule will instruct Rucio to copy
data from RSE #1 to RSE #2. As the data does not actually exist at RSE #1, it will be downloaded from the original http
endpoint exposed by the stac catalogue.
20 changes: 4 additions & 16 deletions examples/demo_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -25,21 +25,9 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'stac_rucio' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[2], line 6\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mpystac_client\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m Client\n\u001b[1;32m 3\u001b[0m \u001b[38;5;66;03m# STAC client connection, with a modifier for updating items on the fly.\u001b[39;00m\n\u001b[1;32m 4\u001b[0m eodc \u001b[38;5;241m=\u001b[39m Client\u001b[38;5;241m.\u001b[39mopen(\n\u001b[1;32m 5\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhttps://stac.eodc.eu/api/v1\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[0;32m----> 6\u001b[0m modifier\u001b[38;5;241m=\u001b[39m\u001b[43mstac_rucio\u001b[49m\u001b[38;5;241m.\u001b[39mrucio_item\n\u001b[1;32m 7\u001b[0m )\n",
"\u001b[0;31mNameError\u001b[0m: name 'stac_rucio' is not defined"
]
}
],
"outputs": [],
"source": [
"from pystac_client import Client\n",
"\n",
Expand Down Expand Up @@ -128,7 +116,7 @@
"outputs": [],
"source": [
"# Create replicas carries out the initial registration of the target asset at the specified RSE.\n",
"stac_rucio.create_replicas(items=november_list, rse=\"EODC-DATA\", target=\"mfcover\")"
"stac_rucio.create_replicas(items=november_list, rse=\"EODC-DATA\", targets=[\"mfcover\"])"
]
},
{
Expand All @@ -138,7 +126,7 @@
"outputs": [],
"source": [
"# Create replication rules generates the rules that instruct rucio that a request has been made to copy data from one RSE to another.\n",
"stac_rucio.create_replication_rules(items=november_list, dst_rse=\"DESY-DCACHE\")"
"stac_rucio.create_replication_rules(items=november_list, dst_rse=\"DESY-DCACHE\", targets=[\"mfcover\"])"
]
},
{
Expand Down
Binary file added stac_rucio.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.