Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document partial product download #36

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 64 additions & 7 deletions datasets/storage-clients.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ To download data products from the Copernicus Data Space after querying them via

The following code snippet demonstrates how to query and download Copernicus data using the Tilebox Python SDK.

<CodeGroup>
```python Python {4,9-13,27}
from pathlib import Path

Expand Down Expand Up @@ -53,6 +52,7 @@ print("Contents: ")
for content in downloaded_data.iterdir():
print(f" - {content.relative_to(downloaded_data)}")
```

```plaintext Output
Downloaded granule: S2A_MSIL2A_20240801T002611_N0511_R102_T58WET_20240819T170544.SAFE to data/Sentinel-2/MSI/L2A/2024/08/01/S2A_MSIL2A_20240801T002611_N0511_R102_T58WET_20240819T170544.SAFE
Contents:
Expand All @@ -65,7 +65,39 @@ Contents:
- rep_info
- S2A_MSIL2A_20240801T002611_N0511_R102_T58WET_20240819T170544-ql.jpg
```
</CodeGroup>

### Partial product downloads

For cases where only a subset of the available file objects for a product is needed, you may restrict your download to just that subset. First, list available objects using `list_objects`, filter them, and then download using `download_objects`.

For example, a Sentinel-2 L2A product includes many files such as metadata, different bands in various resolutions, masks, and quicklook images. The following example shows how to download only specific files from a Sentinel-2 L2A product.

```python Python {4, 15}
collection = datasets.open_data.copernicus.sentinel2_msi.collections()["S2A_S2MSI2A"]
s2_data = collection.load(("2024-08-01", "2024-08-02"), show_progress=True)
selected = s2_data.isel(time=0) # download the first granule in the given time range

objects = storage_client.list_objects(selected)
print(f"Granule {selected.granule_name.item()} consists of {len(objects)} individual objects.")

# only select specific objects to download
want_products = ["B02_10m", "B03_10m", "B08_10m"]
objects = [obj for obj in objects if any(prod in obj for prod in want_products)] # remove all other objects
print(f"Downloading {len(objects)} objects.")
for obj in objects:
print(f" - {obj}")

# Finally, download the selected data
downloaded_data = storage_client.download_objects(selected, objects)
```

```plaintext Output
Granule S2A_MSIL2A_20240801T002611_N0511_R102_T58WET_20240819T170544.SAFE consists of 95 individual objects.
Downloading 3 objects.
- GRANULE/L2A_T58WET_A047575_20240801T002608/IMG_DATA/R10m/T58WET_20240801T002611_B02_10m.jp2
- GRANULE/L2A_T58WET_A047575_20240801T002608/IMG_DATA/R10m/T58WET_20240801T002611_B03_10m.jp2
- GRANULE/L2A_T58WET_A047575_20240801T002608/IMG_DATA/R10m/T58WET_20240801T002611_B08_10m.jp2
```

## Alaska Satellite Facility (ASF)

Expand All @@ -79,7 +111,6 @@ You can create an ASF account in the [ASF Vertex Search Tool](https://search.asf

The following code snippet demonstrates how to query and download ASF data using the Tilebox Python SDK.

<CodeGroup>
```python Python {4,9-13,27}
from pathlib import Path

Expand Down Expand Up @@ -125,7 +156,6 @@ Contents:
- E2_71629_STD_L0_F183.000.nul
- E2_71629_STD_L0_F183.000.ldr
```
</CodeGroup>

### Further Reading

Expand All @@ -150,11 +180,10 @@ Contents:

### Accessing Umbra data

You don't need an account to access Umbra data. All data is provided under a Creative Commons License (CC BY 4.0), allowing you to freely use it.
No account is needed to access Umbra data. All data is under a Creative Commons License (CC BY 4.0), allowing you to use it freely.

The following code snippet demonstrates how to query and download Umbra data using the Tilebox Python SDK.

<CodeGroup>
```python Python {4,9,23}
from pathlib import Path

Expand Down Expand Up @@ -196,5 +225,33 @@ Contents:
- 2024-01-05-01-53-37_UMBRA-07_GEC.tif
- 2024-01-05-01-53-37_UMBRA-07_CSI.tif
```
</CodeGroup>

### Partial product downloads

For cases where only a subset of the available file objects for a given Umbra data point is necessary, you can limit your download to just that subset. First, list available objects using `list_objects`, filter the list, and then use `download_objects`.

The below example shows how to download only the metadata file for a given data point.

```python Python {4, 15}
collection = datasets.open_data.umbra.sar.collections()["SAR"]
umbra_data = collection.load(("2024-01-05", "2024-01-06"), show_progress=True)
# Selecting a data point to download
selected = umbra_data.isel(time=0) # index 0 selected

objects = storage_client.list_objects(selected)
print(f"Data point {selected.granule_name.item()} consists of {len(objects)} individual objects.")

# only select specific objects to download
objects = [obj for obj in objects if "METADATA" in obj] # remove all other objects
print(f"Downloading {len(objects)} object.")
print(objects)

# Finally, download the selected data
downloaded_data = storage_client.download_objects(selected, objects)
```

```plaintext Output
Data point 2024-01-05-01-53-37_UMBRA-07 consists of 6 individual objects.
Downloading 1 object.
['2024-01-05-01-53-37_UMBRA-07_METADATA.json']
```