Skip to content

Commit

Permalink
Add time series dataset pages (#6)
Browse files Browse the repository at this point in the history
  • Loading branch information
corentinmusard authored Aug 21, 2024
1 parent 34f81fd commit 06ffc3b
Show file tree
Hide file tree
Showing 9 changed files with 884 additions and 5 deletions.
2 changes: 1 addition & 1 deletion api-reference/datasets/loading-data.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ first_50 = await collection.load(meta_data.time[:50], skip_data=False)
</ParamField>

<ParamField path="skip_data" type="bool">
If `True`, the response only contain the [datapoint metadata](/timeseries/timeseries-data) without the actual dataset
If `True`, the response only contain the [datapoint metadata](/datasets/timeseries) without the actual dataset
specific fields. Defaults to `False`.
</ParamField>

Expand Down
2 changes: 1 addition & 1 deletion api-reference/datasets/loading-datapoint.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ data = await collection.find(
</ParamField>

<ParamField path="skip_data" type="bool">
If True, the response only contain the [Metadata fields](/timeseries/datasets#common-fields) without the actual
If True, the response only contain the [Metadata fields](/datasets/timeseries#common-fields) without the actual
dataset specific fields. Defaults to `False`.
</ParamField>

Expand Down
2 changes: 1 addition & 1 deletion api-reference/workflows/cancelling-job.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ icon: chart-gantt

The execution of a job can be cancelled by calling the `cancel` method of the `JobClient` instance.

If after cancelling a job you want to resume it, you can [retry](/workflows/api-reference#retrying-a-job) it to undo the cancellation.
If after cancelling a job you want to resume it, you can [retry](/api-reference/workflows/retrying-job) it to undo the cancellation.

<RequestExample>

Expand Down
171 changes: 171 additions & 0 deletions datasets/collections.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
---
title: Collections
description: Learn about Time Series Dataset Collections
---

Collections are a way of grouping together data points from the same dataset. They are useful for representing
a logical grouping of data points that are commonly queried together. For example, if you have a dataset
that contains data from a specific instrument which is onboard different satellites, you may want to group the data
points from each satellite together into a collection.

## Overview

Here is a quick overview of the API for listing and accessing collections which is covered in this page.
Some usage examples for different use-cases are provided below.

| Method | API Reference | Description |
| --------------------- | ---------------------------------------------------------------------- | --------------------------------------------- |
| `dataset.collections` | [Listing collections](/api-reference/datasets/listing-collection) | List all available collections for a dataset. |
| `dataset.collection` | [Accessing a collection](/api-reference/datasets/accessing-collection) | Access an individual collection by its name. |
| `collection.info` | [Collection information](/api-reference/datasets/collection-info) | Request data information for a collection. |

Check out the examples below for some common use-cases when working with collections. The examples
assume that you have already [created a client](/datasets/introduction#creating-a-datasets-client) and
[listed the available datasets](/api-reference/datasets/listing-datasets).

<CodeGroup>

```python Python (Sync)
from tilebox.datasets import Client

client = Client()
datasets = client.datasets()
```
```python Python (Async)
from tilebox.datasets.aio import Client

client = Client()
datasets = await client.datasets()
```

</CodeGroup>

## Listing collections

Each dataset has a list of collections associated with it. You can list the collections for a dataset using the
`collections` method on the dataset object.

<CodeGroup>

```python Python (Sync)
dataset = datasets.open_data.asf.sentinel1_sar
collections = dataset.collections()
print(collections)
```

```python Python (Async)
dataset = datasets.open_data.asf.sentinel1_sar
collections = await dataset.collections()
print(collections)
```

</CodeGroup>

```txt Output
{'Sentinel-1A': Collection Sentinel-1A: [2014-06-15T03:44:43.000 UTC, 2022-12-31T23:57:59.000 UTC] (1209636 data points),
'Sentinel-1B': Collection Sentinel-1B: [2016-09-26T00:02:34.000 UTC, 2021-12-23T06:53:08.000 UTC] (657674 data points)}
```

The `collections` variable is a dictionary, where the keys are the names of the collections and the values are
the collection objects themselves. Each collection within a dataset has a unique name. When listing collections, you
can optionally also request the `availability` of each collection. This returns the time range for which data points
are available in the collection. This is useful for determining which collections contain data points for a specific
time range. You can request the availability by passing `availability=True` to the `collections` method (which is set by default).

Additionally you can also request the number of data points in each collection by passing `count=True` to the `collections`
method.

<CodeGroup>

```python Python (Sync)
dataset = datasets.open_data.asf.sentinel1_sar
collections = dataset.collections(availability=True, count=True)
print(collections)
```

```python Python (Async)
dataset = datasets.open_data.asf.sentinel1_sar
collections = await dataset.collections(availability=True, count=True)
print(collections)
```

</CodeGroup>

```txt Output
{'Sentinel-1A': Collection Sentinel-1A: [2014-06-15T03:44:43.000 UTC, 2022-12-31T23:57:59.000 UTC] (1209636 data points),
'Sentinel-1B': Collection Sentinel-1B: [2016-09-26T00:02:34.000 UTC, 2021-12-23T06:53:08.000 UTC] (657674 data points)}
```

## Accessing individual collections

If you have already listed the collections for a dataset using `dataset.collections()`, you can access a
specific collection by accessing the resulting dictionary of `collections()` with the name of an individual collection.
You can then use the `info()` method on the collection object to get information
(name, availability, and count) about the collection.

<CodeGroup>

```python Python (Sync)
collections = dataset.collections()
sat1 = collections["Sat-1"]
collection_info = sat1.info(availability=True, count=True)
print(collection_info)
```

```python Python (Async)
collections = await dataset.collections()
sat1 = collections["Sat-1"]
collection_info = await sat1.info(availability=True, count=True)
print(collection_info)
```

</CodeGroup>

```txt Output
Collection Sat-1: [2019-03-07T16:09:17.773000 UTC, 2021-05-23T19:17:23.472000 UTC] (910245 data points)
```

You can also access a specific collection by using the `collection` method on the dataset object as well.
This has the advantage that you can directly access the collection without having to list all collections first.

<CodeGroup>

```python Python (Sync)
sat1 = dataset.collection("Sat-1")
collection_info = sat1.info(availability=True, count=True)
print(collection_info)
```

```python Python (Async)
sat1 = dataset.collection("Sat-1")
collection_info = await sat1.info(availability=True, count=True)
print(collection_info)
```

</CodeGroup>

```txt Output
Collection Sat-1: [2019-03-07T16:09:17.773000 UTC, 2021-05-23T19:17:23.472000 UTC] (910245 data points)
```

## Errors you may encounter

### NotFoundError

If you try to access a collection with a name that does not exist, a `NotFoundError` error is raised. For example:

<CodeGroup>

```python Python (Sync)
dataset.collection("Sat-X").info() # raises NotFoundError: 'No such collection Sat-X'
```

```python Python (Async)
await dataset.collection("Sat-X").info() # raises NotFoundError: 'No such collection Sat-X'
```

</CodeGroup>

## Summary

Great, now you know how to list and access collections. Next you can look at [how to query data points from a collection](/datasets/loading-data).
123 changes: 122 additions & 1 deletion datasets/introduction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,125 @@ title: Introduction
description: Learn about Tilebox Datasets
---

Testing
As the name suggests, time series datasets refer to a certain kind of datasets where each data point is associated with a timestamp.
This is a common format for datasets that are collected over time, such as satellite data.

This section covers:

- [Which timeseries datasets are available](/datasets/timeseries#listing-datasets) and how to list them
- [Which common fields](/datasets/timeseries#common-fields) all time series datasets share
- [What collections are](/datasets/collections) and how to access them
- [How to access data](/datasets/loading-data) from a collection for a given time interval

<Note>
If you want to quickly look up the name of some API method or the meaning of a specific parameter [check out the
complete time series API Reference](/api-reference/datasets/).
</Note>

## Terminology

Here are some terms used throughout this section.

- **Data points**: time series data points are the individual entities that make up a dataset. Each data point is associated with a timestamp.
Each data point consists of a set of fixed [metadata fields](/datasets/timeseries#common-fields) as well as individual fields that are defined on a dataset level.
- **Datasets**: time series datasets are a container for individual data points. All data points in a time series dataset share the same data type, so all
data points in a dataset share the same set of fields.
- **Collections**: Collections are a way of grouping data points within a dataset. They are useful for representing a logical grouping of data points that are commonly queried together.

## Creating a datasets Client

Prerequisites

- You've [installed](/sdks/python/installation) the `tilebox-datasets` package
- You've [created](/authentication) a Tilebox API key

With the prerequisites out of the way, you can now create a client instance to start interacting with your Tilebox Datasets.

<CodeGroup>

```python Python (Sync)
from tilebox.datasets import Client

client = Client(token="YOUR_TILEBOX_API_KEY")
```
```python Python (Async)
from tilebox.datasets.aio import Client

client = Client(token="YOUR_TILEBOX_API_KEY")
```

</CodeGroup>

As an alternative, you can also set the `TILEBOX_API_KEY` environment variable to your API key and instantiate the client
without passing the `token` argument. Python automatically pick up the environment variable and use it to authenticate with the API.

<CodeGroup>

```python Python (Sync)
from tilebox.datasets import Client

# requires a TILEBOX_API_KEY environment variable
client = Client()
```
```python Python (Async)
from tilebox.datasets.aio import Client

# requires a TILEBOX_API_KEY environment variable
client = Client()
```

</CodeGroup>

Tilebox datasets offers a standard synchronous API by default, but also give you the option of an async client if you need it.

The synchronous client is great for data exploration in interactive environments like Jupyter notebooks.
The asynchronous client is great for building production ready applications that need to scale. To find out more
about the differences between the two clients, check out the [Async support](/sdks/python/async) page.

### Exploring datasets

Now that you have a client instance, you can start exploring the datasets that are available. An easy way to do this
is to [list all datasets](/api-reference/datasets/listing-datasets) and then using the autocomplete capability
of your IDE or inside your Jupyter notebook.

<CodeGroup>

```python Python (Sync)
datasets = client.datasets()
datasets. # trigger autocomplete here to get an overview of the available datasets
```

```python Python (Async)
datasets = await client.datasets()
datasets. # trigger autocomplete here to get an overview of the available datasets
```

</CodeGroup>

### Errors you might encounter

#### AuthenticationError

`AuthenticationError` is raised when the client is unable to authenticate with the Tilebox API. This can happen when
the provided API key is invalid or expired. Instantiating a client with an invalid API key does not raise an error
directly, but only when you try to make a request to the API.

<CodeGroup>

```python Python (Sync)
client = Client(token="invalid-key") # runs without error
datasets = client.datasets() # raises AuthenticationError
```

```python Python (Async)
client = Client(token="invalid-key") # runs without error
datasets = await client.datasets() # raises AuthenticationError
```

</CodeGroup>

## Next steps

- [Accessing datasets](/datasets/timeseries)
- [Async support](/sdks/python/async)
- [Working with Xarray](/sdks/python/xarray)
Loading

0 comments on commit 06ffc3b

Please sign in to comment.