Skip to content

Commit

Permalink
Add python pages (#4)
Browse files Browse the repository at this point in the history
* Add python pages

* remove partially updated files to fix CI

* lfs is not supported by mintlify
  • Loading branch information
corentinmusard authored Aug 19, 2024
1 parent ef39f71 commit 8af0ae8
Show file tree
Hide file tree
Showing 13 changed files with 1,138 additions and 40 deletions.
8 changes: 8 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"editor.formatOnSave": true,
"editor.rulers": [120],
"files.insertFinalNewline": true,
"[mdx]": {
"editor.defaultFormatter": "esbenp.prettier-vscode"
}
}
13 changes: 13 additions & 0 deletions assets/logos/python.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/xarray/xarray-datastructure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
45 changes: 12 additions & 33 deletions mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,7 @@
"navigation": [
{
"group": "Get Started",
"pages": [
"introduction",
"console",
"quickstart",
"authentication"
]
"pages": ["introduction", "console", "quickstart", "authentication"]
},
{
"group": "SDKs",
Expand All @@ -57,56 +52,40 @@
"sdks/python/xarray",
"sdks/python/async",
"sdks/python/geometries",
"sdks/python/api-reference"
{
"group": "API Reference",
"icon": "book",
"pages": ["sdks/python/api-reference/datasets", "sdks/python/api-reference/workflows"]
}
]
},
{
"group": "Go",
"icon": "golang",
"pages": [
"sdks/go/introduction"
]
"pages": ["sdks/go/introduction"]
}
]
},
{
"group": "Datasets",
"pages": [
"datasets/introduction",
"datasets/timeseries",
"datasets/collections",
"datasets/loading-data"
]
"pages": ["datasets/introduction", "datasets/timeseries", "datasets/collections", "datasets/loading-data"]
},
{
"group": "Workflows",
"pages": [
"workflows/introduction",
{
"group": "Concepts",
"pages": [
"workflows/tasks",
"workflows/jobs",
"workflows/task-runners",
"workflows/clusters"
]
"pages": ["workflows/tasks", "workflows/jobs", "workflows/task-runners", "workflows/clusters"]
},
"workflows/caching",
{
"group": "Observability",
"pages": [
"workflows/tracing",
"workflows/logging",
"workflows/axiom"
]
"pages": ["workflows/tracing", "workflows/logging", "workflows/axiom"]
},
{
"group": "Near-Real Time",
"pages": [
"workflows/recurring-tasks",
"workflows/cron-triggers",
"workflows/storage-event-triggers"
]
"pages": ["workflows/recurring-tasks", "workflows/cron-triggers", "workflows/storage-event-triggers"]
}
]
}
Expand All @@ -121,4 +100,4 @@
"github": "https://github.com/tilebox",
"linkedin": "https://www.linkedin.com/company/tilebox-io"
}
}
}
6 changes: 6 additions & 0 deletions prettier.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
/** @type {import("prettier").Config} */
const config = {
printWidth: 120,
};

module.exports = config;
7 changes: 5 additions & 2 deletions quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ If you prefer to work locally in your device, the steps below help you get start
<img src="/assets/console/api-keys-light.png" alt="Tilebox Console" className="dark:hidden" />
<img src="/assets/console/api-keys-dark.png" alt="Tilebox Console" className="hidden dark:block" />
</Frame>

</Step>
<Step title="Query data">
Use the datasets client to query data from a dataset.
Expand All @@ -36,15 +37,16 @@ If you prefer to work locally in your device, the steps below help you get start
from tilebox.datasets import Client

client = Client(token="YOUR_TILEBOX_API_KEY")
# select an open data dataset

# select an Opendata dataset
datasets = client.datasets()
dataset = datasets.open_data.asf.sentinel2_msi

# and load data from a collection in a given time range
collection = dataset.collection("S2A_S2MSI1C")
data_january_2022 = collection.load(("2022-01-01", "2022-02-01"))
```

</Step>
<Step title="Run a workflow task">
Use the workflows client to create and submit a task.
Expand All @@ -70,6 +72,7 @@ If you prefer to work locally in your device, the steps below help you get start
<Note>
For this snippet to work you need to have a cluster already created. Check out the guide on [clusters](/workflows/clusters) to learn how to create one.
</Note>

</Step>
<Step title="Explore further">
Check out the following guides to learn more about the individual modules that make up Tilebox:
Expand Down
242 changes: 242 additions & 0 deletions sdks/python/async.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
---
title: Async support
description: In this section we look at async support within the tilebox datasets python client.
icon: rotate
---

Tilebox offer a standard synchronous API by default, but also give you to option of an async client if you need it.

The synchronous datasets client is great for data exploration in interactive environments like Jupyter notebooks.
The asynchronous datasets client is great for building production ready applications that need to scale.

Async is a concurrency model that is far more efficient than multi-threading, and can provide significant
performance benefits.

## Switching to an async datasets client

Typically all you need to do is swap out your import statement of the `Client` and you're good to go. Check out
the example below to see how that is done works.

<CodeGroup>
```python Python (Sync)
from tilebox.datasets import Client

# this client is sync
client = Client()
```
```python Python (Async)
from tilebox.datasets.aio import Client

# this client is async
client = Client()
```

</CodeGroup>

Once you have switched to the async client, you can use the `async` and `await` keywords to make your code async.
Check out the examples below to see how that works for a few examples.

<CodeGroup>

```python Python (Sync)
# Listing datasets
datasets = client.datasets()

# Listing collections
dataset = datasets.open_data.asf.sentinel1_sar
collections = dataset.collections()

# Collection information
collection = collections["Sentinel-1A"]
info = collection.info()
print(f"Data for My-collection is available for {info.availability}")

# Loading data
data = collection.load(("2022-05-01", "2022-06-01"), show_progress=True)

# Finding a specific datapoint
datapoint_uuid = "01811c8f-0928-e6f5-df34-364cfa8a86e8"
datapoint = collection.find(datapoint_uuid)
```

```python Python (Async)
# Listing datasets
datasets = await client.datasets()

# Listing collections
dataset = datasets.open_data.asf.sentinel1_sar
collections = await dataset.collections()

# Collection information
collection = collections["Sentinel-1A"]
info = await collection.info()
print(f"Data for My-collection is available for {info.availability}")

# Loading data
data = await collection.load(("2022-05-01", "2022-06-01"), show_progress=True)

# Finding a specific datapoint
datapoint_uuid = "01811c8f-0928-e6f5-df34-364cfa8a86e8"
datapoint = await collection.find(datapoint_uuid)
```

</CodeGroup>

<Note>
Async concurrency is also supported in Jupyter notebooks or similar interactive environments. You can even use `await
some_async_call()` as the output of a code cell.
</Note>

## Benefits

The main benefit of using an async client is that you can run requests concurrently, which improve performance.
This is especially useful when you are loading data from different collections.
Check out the example below to see how that works.

## Example: Fetching data concurrently

The following example fetches data from different collections.
In the synchronous example, it fetches the data sequentially, whereas in the async example it fetches the data concurrently.
This means that the async approach is faster for such use cases.

<CodeGroup>

```python Python (Sync)
# example: fetching data sequentially

import time
from tilebox.datasets import Client
from tilebox.datasets.timeseries import RemoteTimeseriesDatasetCollection # for type hinting

client = Client()
datasets = client.datasets()
collections = datasets.open_data.asf.sentinel1_sar.collections()

def stats_for_2020(collection: RemoteTimeseriesDatasetCollection) -> None:
"""Fetch data for 2020 and print the number of data points that were loaded."""
data = collection.load(("2020-01-01", "2021-01-01"), show_progress=True)
n = data.sizes['time'] if 'time' in data else 0
print(f"There are {data.sizes['time']} datapoints in {collection.name} for 2020.")

start = time.time()

# for each collection
for name in collections:
# fetch the data, print the number of datapoints and then continue to the next collection
stats_for_2020(collections[name])

end = time.time()
print(f"Fetching data took {end - start:.2f} seconds")
```

```python Python (Async)
# example: fetching data concurrently

import asyncio
import time
from tilebox.datasets.aio import Client
from tilebox.datasets.timeseries import RemoteTimeseriesDatasetCollection # for type hinting

client = Client()
datasets = await client.datasets()
collections = await datasets.open_data.asf.sentinel1_sar.collections()

async def stats_for_2020(collection: RemoteTimeseriesDatasetCollection) -> None:
"""Fetch data for 2020 and print the number of data points that were loaded."""
data = await collection.load(("2020-01-01", "2021-01-01"), show_progress=True)
n = data.sizes['time'] if 'time' in data else 0
print(f"There are {data.sizes['time']} datapoints in {collection.name} for 2020.")

start = time.time()

# initiate all requests concurrently
requests = [stats_for_2020(collections[name]) for name in collections]
# and then wait for all to finish in parallel before continuing
await asyncio.gather(*requests)

end = time.time()
print(f"Fetching data took {end - start:.2f} seconds")
```

</CodeGroup>

The output is shown below. As you can see, the async approach is 5 seconds faster. If you have `show_progress` enabled,
the progress bars are updated concurrently. In this example the second collection contains less data than the first one,
so it finishes first.

<CodeGroup>

```txt Python (Sync)
Fetching data: 100% |██████████████████████████████ [00:13<00:00, 207858 datapoints, 3.91 MB/s]
There are 207858 datapoints in Sentinel-1A for 2020.
Fetching data: 100% |██████████████████████████████ [00:11<00:00, 179665 datapoints, 4.39 MB/s]
There are 179665 datapoints in Sentinel-1B for 2020.
Fetching data took 25.34 seconds
```

```txt Python (Async)
Fetching data: 100% |██████████████████████████████ [00:19<00:00, 207858 datapoints, 2.21 MB/s]
Fetching data: 100% |██████████████████████████████ [00:17<00:00, 179665 datapoints, 2.94 MB/s]
There are 179665 datapoints in Sentinel-1B for 2020.
There are 207858 datapoints in Sentinel-1A for 2020.
Fetching data took 20.12 seconds
```

</CodeGroup>

## Supported async environments

The Tilebox Datasets Python client supports either `asyncio` or `trio` as an async backend.
It auto-detects which of those two to use.

### AsyncIO

AsyncIO is Python's [built-in library](https://docs.python.org/3/library/asyncio.html) for writing concurrent
code with the async/await syntax.

```python
import asyncio
from tilebox.datasets.aio import Client

async def main():
client = Client()
datasets = await client.datasets()
print(datasets)

asyncio.run(main())
```

### Trio

Trio is an [alternative async library](https://trio.readthedocs.io/en/stable/), designed around the
[principles of structured concurrency](https://en.wikipedia.org/wiki/Structured_concurrency).

```python
import trio
from tilebox.datasets.aio import Client

async def main():
client = Client()
datasets = await client.datasets()
print(datasets)

trio.run(main)
```

### AnyIO

AnyIO is an [asynchronous networking and concurrency library](https://anyio.readthedocs.io/en/stable/) that works on
top of either asyncio or trio. The Tilebox Datasets Python client is written using `anyio`, that way it can be used with
either `asyncio` or `trio`.

```python
import anyio
from tilebox.datasets.aio import Client

async def main():
client = Client()
datasets = await client.datasets()
print(datasets)

anyio.run(main, backend="asyncio")
```
Loading

0 comments on commit 8af0ae8

Please sign in to comment.