Add time series dataset pages (#6)

tilebox · Aug 21, 2024 · 06ffc3b · 06ffc3b
1 parent 34f81fd
commit 06ffc3b
Show file tree

Hide file tree

Showing 9 changed files with 884 additions and 5 deletions.
diff --git a/api-reference/datasets/loading-data.mdx b/api-reference/datasets/loading-data.mdx
@@ -90,7 +90,7 @@ first_50 = await collection.load(meta_data.time[:50], skip_data=False)
 </ParamField>
 
 <ParamField path="skip_data" type="bool">
-  If `True`, the response only contain the [datapoint metadata](/timeseries/timeseries-data) without the actual dataset
+  If `True`, the response only contain the [datapoint metadata](/datasets/timeseries) without the actual dataset
   specific fields. Defaults to `False`.
 </ParamField>
 

diff --git a/api-reference/datasets/loading-datapoint.mdx b/api-reference/datasets/loading-datapoint.mdx
@@ -33,7 +33,7 @@ data = await collection.find(
 </ParamField>
 
 <ParamField path="skip_data" type="bool">
-  If True, the response only contain the [Metadata fields](/timeseries/datasets#common-fields) without the actual
+  If True, the response only contain the [Metadata fields](/datasets/timeseries#common-fields) without the actual
   dataset specific fields. Defaults to `False`.
 </ParamField>
 

diff --git a/api-reference/workflows/cancelling-job.mdx b/api-reference/workflows/cancelling-job.mdx
@@ -6,7 +6,7 @@ icon: chart-gantt
 
 The execution of a job can be cancelled by calling the `cancel` method of the `JobClient` instance.
 
-If after cancelling a job you want to resume it, you can [retry](/workflows/api-reference#retrying-a-job) it to undo the cancellation.
+If after cancelling a job you want to resume it, you can [retry](/api-reference/workflows/retrying-job) it to undo the cancellation.
 
 <RequestExample>
 

diff --git a/datasets/collections.mdx b/datasets/collections.mdx
@@ -0,0 +1,171 @@
+---
+title: Collections
+description: Learn about Time Series Dataset Collections
+---
+
+Collections are a way of grouping together data points from the same dataset. They are useful for representing
+a logical grouping of data points that are commonly queried together. For example, if you have a dataset
+that contains data from a specific instrument which is onboard different satellites, you may want to group the data
+points from each satellite together into a collection.
+
+## Overview
+
+Here is a quick overview of the API for listing and accessing collections which is covered in this page.
+Some usage examples for different use-cases are provided below.
+
+| Method                | API Reference                                                          | Description                                   |
+| --------------------- | ---------------------------------------------------------------------- | --------------------------------------------- |
+| `dataset.collections` | [Listing collections](/api-reference/datasets/listing-collection)      | List all available collections for a dataset. |
+| `dataset.collection`  | [Accessing a collection](/api-reference/datasets/accessing-collection) | Access an individual collection by its name.  |
+| `collection.info`     | [Collection information](/api-reference/datasets/collection-info)      | Request data information for a collection.    |
+
+Check out the examples below for some common use-cases when working with collections. The examples
+assume that you have already [created a client](/datasets/introduction#creating-a-datasets-client) and
+[listed the available datasets](/api-reference/datasets/listing-datasets).
+
+<CodeGroup>
+
+    ```python Python (Sync)
+    from tilebox.datasets import Client
+
+    client = Client()
+    datasets = client.datasets()
+    ```
+    ```python Python (Async)
+    from tilebox.datasets.aio import Client
+
+    client = Client()
+    datasets = await client.datasets()
+    ```
+
+</CodeGroup>
+
+## Listing collections
+
+Each dataset has a list of collections associated with it. You can list the collections for a dataset using the
+`collections` method on the dataset object.
+
+<CodeGroup>
+
+  ```python Python (Sync)
+  dataset = datasets.open_data.asf.sentinel1_sar
+  collections = dataset.collections()
+  print(collections)
+  ```
+
+  ```python Python (Async)
+  dataset = datasets.open_data.asf.sentinel1_sar
+  collections = await dataset.collections()
+  print(collections)
+  ```
+
+</CodeGroup>
+
+```txt Output
+{'Sentinel-1A': Collection Sentinel-1A: [2014-06-15T03:44:43.000 UTC, 2022-12-31T23:57:59.000 UTC] (1209636 data points),
+ 'Sentinel-1B': Collection Sentinel-1B: [2016-09-26T00:02:34.000 UTC, 2021-12-23T06:53:08.000 UTC] (657674 data points)}
+```
+
+The `collections` variable is a dictionary, where the keys are the names of the collections and the values are
+the collection objects themselves. Each collection within a dataset has a unique name. When listing collections, you
+can optionally also request the `availability` of each collection. This returns the time range for which data points
+are available in the collection. This is useful for determining which collections contain data points for a specific
+time range. You can request the availability by passing `availability=True` to the `collections` method (which is set by default).
+
+Additionally you can also request the number of data points in each collection by passing `count=True` to the `collections`
+method.
+
+<CodeGroup>
+
+  ```python Python (Sync)
+  dataset = datasets.open_data.asf.sentinel1_sar
+  collections = dataset.collections(availability=True, count=True)
+  print(collections)
+  ```
+
+  ```python Python (Async)
+  dataset = datasets.open_data.asf.sentinel1_sar
+  collections = await dataset.collections(availability=True, count=True)
+  print(collections)
+  ```
+
+</CodeGroup>
+
+```txt Output
+{'Sentinel-1A': Collection Sentinel-1A: [2014-06-15T03:44:43.000 UTC, 2022-12-31T23:57:59.000 UTC] (1209636 data points),
+ 'Sentinel-1B': Collection Sentinel-1B: [2016-09-26T00:02:34.000 UTC, 2021-12-23T06:53:08.000 UTC] (657674 data points)}
+```
+
+## Accessing individual collections
+
+If you have already listed the collections for a dataset using `dataset.collections()`, you can access a
+specific collection by accessing the resulting dictionary of `collections()` with the name of an individual collection.
+You can then use the `info()` method on the collection object to get information
+(name, availability, and count) about the collection.
+
+<CodeGroup>
+
+  ```python Python (Sync)
+  collections = dataset.collections()
+  sat1 = collections["Sat-1"]
+  collection_info = sat1.info(availability=True, count=True)
+  print(collection_info)
+  ```
+
+  ```python Python (Async)
+  collections = await dataset.collections()
+  sat1 = collections["Sat-1"]
+  collection_info = await sat1.info(availability=True, count=True)
+  print(collection_info)
+  ```
+
+</CodeGroup>
+
+```txt Output
+Collection Sat-1: [2019-03-07T16:09:17.773000 UTC, 2021-05-23T19:17:23.472000 UTC] (910245 data points)
+```
+
+You can also access a specific collection by using the `collection` method on the dataset object as well.
+This has the advantage that you can directly access the collection without having to list all collections first.
+
+<CodeGroup>
+
+  ```python Python (Sync)
+  sat1 = dataset.collection("Sat-1")
+  collection_info = sat1.info(availability=True, count=True)
+  print(collection_info)
+  ```
+
+  ```python Python (Async)
+  sat1 = dataset.collection("Sat-1")
+  collection_info = await sat1.info(availability=True, count=True)
+  print(collection_info)
+  ```
+
+</CodeGroup>
+
+```txt Output
+Collection Sat-1: [2019-03-07T16:09:17.773000 UTC, 2021-05-23T19:17:23.472000 UTC] (910245 data points)
+```
+
+## Errors you may encounter
+
+### NotFoundError
+
+If you try to access a collection with a name that does not exist, a `NotFoundError` error is raised. For example:
+
+<CodeGroup>
+
+```python Python (Sync)
+dataset.collection("Sat-X").info() # raises NotFoundError: 'No such collection Sat-X'
+```
+
+```python Python (Async)
+await dataset.collection("Sat-X").info() # raises NotFoundError: 'No such collection Sat-X'
+```
+
+</CodeGroup>
+
+## Summary
+
+Great, now you know how to list and access collections. Next you can look at [how to query data points from a collection](/datasets/loading-data).
diff --git a/datasets/introduction.mdx b/datasets/introduction.mdx
@@ -3,4 +3,125 @@ title: Introduction
 description: Learn about Tilebox Datasets
 ---
 
-Testing
+As the name suggests, time series datasets refer to a certain kind of datasets where each data point is associated with a timestamp.
+This is a common format for datasets that are collected over time, such as satellite data.
+
+This section covers:
+
+- [Which timeseries datasets are available](/datasets/timeseries#listing-datasets) and how to list them
+- [Which common fields](/datasets/timeseries#common-fields) all time series datasets share
+- [What collections are](/datasets/collections) and how to access them
+- [How to access data](/datasets/loading-data) from a collection for a given time interval
+
+<Note>
+  If you want to quickly look up the name of some API method or the meaning of a specific parameter [check out the
+  complete time series API Reference](/api-reference/datasets/).
+</Note>
+
+## Terminology
+
+Here are some terms used throughout this section.
+
+- **Data points**: time series data points are the individual entities that make up a dataset. Each data point is associated with a timestamp.
+  Each data point consists of a set of fixed [metadata fields](/datasets/timeseries#common-fields) as well as individual fields that are defined on a dataset level.
+- **Datasets**: time series datasets are a container for individual data points. All data points in a time series dataset share the same data type, so all
+  data points in a dataset share the same set of fields.
+- **Collections**: Collections are a way of grouping data points within a dataset. They are useful for representing a logical grouping of data points that are commonly queried together.
+
+## Creating a datasets Client
+
+Prerequisites
+
+- You've [installed](/sdks/python/installation) the `tilebox-datasets` package
+- You've [created](/authentication) a Tilebox API key
+
+With the prerequisites out of the way, you can now create a client instance to start interacting with your Tilebox Datasets.
+
+<CodeGroup>
+
+    ```python Python (Sync)
+    from tilebox.datasets import Client
+
+    client = Client(token="YOUR_TILEBOX_API_KEY")
+    ```
+    ```python Python (Async)
+    from tilebox.datasets.aio import Client
+
+    client = Client(token="YOUR_TILEBOX_API_KEY")
+    ```
+
+</CodeGroup>
+
+As an alternative, you can also set the `TILEBOX_API_KEY` environment variable to your API key and instantiate the client
+without passing the `token` argument. Python automatically pick up the environment variable and use it to authenticate with the API.
+
+<CodeGroup>
+
+    ```python Python (Sync)
+    from tilebox.datasets import Client
+
+    # requires a TILEBOX_API_KEY environment variable
+    client = Client()
+    ```
+    ```python Python (Async)
+    from tilebox.datasets.aio import Client
+
+    # requires a TILEBOX_API_KEY environment variable
+    client = Client()
+    ```
+
+</CodeGroup>
+
+Tilebox datasets offers a standard synchronous API by default, but also give you the option of an async client if you need it.
+
+The synchronous client is great for data exploration in interactive environments like Jupyter notebooks.
+The asynchronous client is great for building production ready applications that need to scale. To find out more
+about the differences between the two clients, check out the [Async support](/sdks/python/async) page.
+
+### Exploring datasets
+
+Now that you have a client instance, you can start exploring the datasets that are available. An easy way to do this
+is to [list all datasets](/api-reference/datasets/listing-datasets) and then using the autocomplete capability
+of your IDE or inside your Jupyter notebook.
+
+<CodeGroup>
+
+```python Python (Sync)
+datasets = client.datasets()
+datasets. # trigger autocomplete here to get an overview of the available datasets
+```
+
+```python Python (Async)
+datasets = await client.datasets()
+datasets. # trigger autocomplete here to get an overview of the available datasets
+```
+
+</CodeGroup>
+
+### Errors you might encounter
+
+#### AuthenticationError
+
+`AuthenticationError` is raised when the client is unable to authenticate with the Tilebox API. This can happen when
+the provided API key is invalid or expired. Instantiating a client with an invalid API key does not raise an error
+directly, but only when you try to make a request to the API.
+
+<CodeGroup>
+
+```python Python (Sync)
+client = Client(token="invalid-key") # runs without error
+datasets = client.datasets() # raises AuthenticationError
+```
+
+```python Python (Async)
+client = Client(token="invalid-key") # runs without error
+datasets = await client.datasets() # raises AuthenticationError
+```
+
+</CodeGroup>
+
+## Next steps
+
+- [Accessing datasets](/datasets/timeseries)
+- [Async support](/sdks/python/async)
+- [Working with Xarray](/sdks/python/xarray)