Skip to content

Commit

Permalink
Improve dataset pages
Browse files Browse the repository at this point in the history
  • Loading branch information
lukasbindreiter committed Dec 4, 2024
1 parent c203d13 commit db0f780
Show file tree
Hide file tree
Showing 11 changed files with 560 additions and 674 deletions.
119 changes: 37 additions & 82 deletions datasets/collections.mdx
Original file line number Diff line number Diff line change
@@ -1,79 +1,42 @@
---
title: Collections
description: Learn about Time Series Dataset Collections
description: Learn about time series dataset collections
icon: layer-group
---

Collections are a way of grouping together data points from the same dataset. They are useful for representing
a logical grouping of data points that are commonly queried together. For example, if you have a dataset
that contains data from a specific instrument which is onboard different satellites, you may want to group the data
points from each satellite together into a collection.
Collections group data points within a dataset. They help represent logical groupings of data points that are commonly queried together. For example, if your dataset includes data from a specific instrument on different satellites, you can group the data points from each satellite into a collection.

## Overview

Here is a quick overview of the API for listing and accessing collections which is covered in this page.
Some usage examples for different use-cases are provided below.
This section provides a quick overview of the API for listing and accessing collections. Below are some usage examples for different scenarios.

| Method | API Reference | Description |
| --------------------- | ---------------------------------------------------------------------- | --------------------------------------------- |
| `dataset.collections` | [Listing collections](/api-reference/datasets/listing-collection) | List all available collections for a dataset. |
| `dataset.collection` | [Accessing a collection](/api-reference/datasets/accessing-collection) | Access an individual collection by its name. |
| `collection.info` | [Collection information](/api-reference/datasets/collection-info) | Request data information for a collection. |
| Method | API Reference | Description |
| --------------------- | ------------------------------------------------------------------------------- | --------------------------------------------- |
| `dataset.collections` | [Listing collections](/api-reference/tilebox.datasets/Dataset.collections) | List all available collections for a dataset. |
| `dataset.collection` | [Accessing a collection](/api-reference/tilebox.datasets/Dataset.collection) | Access an individual collection by its name. |
| `collection.info` | [Collection information](/api-reference/tilebox.datasets/Collection.info) | Request information about a collection. |

Check out the examples below for some common use-cases when working with collections. The examples
assume that you have already [created a client](/datasets/introduction#creating-a-datasets-client) and
[listed the available datasets](/api-reference/datasets/listing-datasets).
Refer to the examples below for common use cases when working with collections. These examples assume that you have already [created a client](/datasets/introduction#creating-a-datasets-client) and [listed the available datasets](/api-reference/tilebox.datasets/Client.datasets).

<CodeGroup>
```python Python
from tilebox.datasets import Client

```python Python
from tilebox.datasets import Client

client = Client()
datasets = client.datasets()
```

client = Client()
datasets = client.datasets()
```
</CodeGroup>

## Listing collections

Each dataset has a list of collections associated with it. You can list the collections for a dataset using the
`collections` method on the dataset object.
To list the collections for a dataset, use the `collections` method on the dataset object.

<CodeGroup>

```python Python
dataset = datasets.open_data.copernicus.landsat8_oli_tirs
collections = dataset.collections()
print(collections)
```

</CodeGroup>

```plaintext Output
{'L1GT': Collection L1GT: [2013-03-25T12:08:43.699 UTC, 2024-08-19T12:57:32.456 UTC],
'L1T': Collection L1T: [2013-03-26T09:33:19.763 UTC, 2020-08-24T03:21:50.000 UTC],
'L1TP': Collection L1TP: [2013-03-24T00:25:55.457 UTC, 2024-08-19T12:58:20.229 UTC],
'L2SP': Collection L2SP: [2015-01-01T07:53:35.391 UTC, 2024-08-12T12:52:03.243 UTC]}
```python Python
dataset = datasets.open_data.copernicus.landsat8_oli_tirs
collections = dataset.collections()
print(collections)
```

The `collections` variable is a dictionary, where the keys are the names of the collections and the values are
the collection objects themselves. Each collection within a dataset has a unique name. When listing collections, you
can optionally also request the `availability` of each collection. This returns the time range for which data points
are available in the collection. This is useful for determining which collections contain data points for a specific
time range. You can request the availability by passing `availability=True` to the `collections` method (which is set by default).

Additionally you can also request the number of data points in each collection by passing `count=True` to the `collections`
method.

<CodeGroup>

```python Python
dataset = datasets.open_data.copernicus.landsat8_oli_tirs
collections = dataset.collections()
print(collections)
```

</CodeGroup>

```plaintext Output
Expand All @@ -83,39 +46,33 @@ method.
'L2SP': Collection L2SP: [2015-01-01T07:53:35.391 UTC, 2024-08-12T12:52:03.243 UTC] (191110 data points)}
```

[dataset.collections](/api-reference/tilebox.datasets/Dataset.collections) returns a dictionary mapping collection names to their corresponding collection objects. Each collection has a unique name within its dataset.

## Accessing individual collections

If you have already listed the collections for a dataset using `dataset.collections()`, you can access a
specific collection by accessing the resulting dictionary of `collections()` with the name of an individual collection.
You can then use the `info()` method on the collection object to get information
(name, availability, and count) about the collection.
Once you have listed the collections for a dataset using [dataset.collections()](/api-reference/tilebox.datasets/Dataset.collections), you can access a specific collection by retrieving it from the resulting dictionary with its name. Use [collection.info()](/api-reference/tilebox.datasets/Collection.info) to get details (name, availability, and count) about it.

<CodeGroup>

```python Python
collections = dataset.collections()
terrain_correction = collections["L1GT"]
collection_info = terrain_correction.info()
print(collection_info)
```

<CodeGroup>
```python Python
collections = dataset.collections()
terrain_correction = collections["L1GT"]
collection_info = terrain_correction.info()
print(collection_info)
```
</CodeGroup>

```plaintext Output
L1GT: [2013-03-25T12:08:43.699 UTC, 2024-08-19T12:57:32.456 UTC] (154288 data points)
```

You can also access a specific collection by using the `collection` method on the dataset object as well.
This has the advantage that you can directly access the collection without having to list all collections first.
You can also access a specific collection directly using the [dataset.collection](/api-reference/tilebox.datasets/Dataset.collection) method on the dataset object. This method allows you to get the collection without having to list all collections first.

<CodeGroup>

```python Python
terrain_correction = dataset.collection("L1GT")
collection_info = terrain_correction.info()
print(collection_info)
```

```python Python
terrain_correction = dataset.collection("L1GT")
collection_info = terrain_correction.info()
print(collection_info)
```
</CodeGroup>

```plaintext Output
Expand All @@ -126,20 +83,18 @@ L1GT: [2013-03-25T12:08:43.699 UTC, 2024-08-19T12:57:32.456 UTC] (154288 data po

### NotFoundError

If you try to access a collection with a name that does not exist, a `NotFoundError` error is raised. For example:
If you attempt to access a collection with a non-existent name, a `NotFoundError` is raised. For example:

<CodeGroup>

```python Python
dataset.collection("Sat-X").info() # raises NotFoundError: 'No such collection Sat-X'
```

</CodeGroup>

## Next steps

<CardGroup cols={2}>
<Card title="Loading Data" icon="download" href="/datasets/loading-data" horizontal>
How to load data points from a collection.
Learn how to load data points from a collection.
</Card>
</CardGroup>
63 changes: 25 additions & 38 deletions datasets/introduction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,59 +4,53 @@ description: Learn about Tilebox Datasets
icon: house
---

As the name suggests, time series datasets refer to a certain kind of datasets where each data point is associated with a timestamp.
This is a common format for datasets that are collected over time, such as satellite data.
Time series datasets refer to datasets where each data point is linked to a timestamp. This format is common for data collected over time, such as satellite data.

This section covers:

<CardGroup cols={2}>
<Card title="Available Datasets" icon="list" href="/datasets/timeseries#listing-datasets" horizontal>
Which time series datasets are available and how to list them.
Discover available time series datasets and learn how to list them.
</Card>
<Card title="Common Fields" icon="file-code" href="/datasets/timeseries#common-fields" horizontal>
Which common fields all time series datasets share.
Understand the common fields shared by all time series datasets.
</Card>
<Card title="Collections" icon="layer-group" href="/datasets/collections" horizontal>
What collections are and how to access them.
Learn what collections are and how to access them.
</Card>
<Card title="Loading Data" icon="download" href="/datasets/loading-data" horizontal>
How to access data from a collection for a given time interval.
Find out how to access data from a collection for specific time intervals.
</Card>
</CardGroup>

<Note>
If you want to quickly look up the name of some API method or the meaning of a specific parameter [check out the
complete time series API Reference](/api-reference/datasets/).
For a quick reference to API methods or specific parameter meanings, [check out the complete time series API Reference](/api-reference/datasets/).
</Note>

## Terminology

Here are some terms used throughout this section.
Get familiar with some key terms when working with time series datasets.

<AccordionGroup>
<Accordion title="Data points">
Time series data points are the individual entities that make up a dataset. Each data point is associated with a
timestamp. Each data point consists of a set of fixed [metadata fields](/datasets/timeseries#common-fields) as well
as individual fields that are defined on a dataset level.
Time series data points are individual entities that form a dataset. Each data point has a timestamp and consists of a set of fixed [metadata fields](/datasets/timeseries#common-fields) along with dataset-specific fields.
</Accordion>
<Accordion title="Datasets">
Time series datasets are a container for individual data points. All data points in a time series dataset share the
same data type, so all data points in a dataset share the same set of fields.
Time series datasets act as containers for data points. All data points in a dataset share the same type and fields.
</Accordion>
<Accordion title="Collections">
Collections are a way of grouping data points within a dataset. They are useful for representing a logical grouping
of data points that are commonly queried together.
Collections group data points within a dataset. They help represent logical groupings of data points that are often queried together.
</Accordion>
</AccordionGroup>

## Creating a datasets Client
## Creating a datasets client

Prerequisites

- You've [installed](/sdks/python/install) the `tilebox-datasets` package
- You've [created](/authentication) a Tilebox API key
- You have [installed](/sdks/python/install) the `tilebox-datasets` package.
- You have [created](/authentication) a Tilebox API key.

With the prerequisites out of the way, you can now create a client instance to start interacting with your Tilebox Datasets.
After meeting these prerequisites, you can create a client instance to interact with Tilebox Datasets.

<CodeGroup>

Expand All @@ -68,8 +62,7 @@ With the prerequisites out of the way, you can now create a client instance to s

</CodeGroup>

As an alternative, you can also set the `TILEBOX_API_KEY` environment variable to your API key and instantiate the client
without passing the `token` argument. Python automatically pick up the environment variable and use it to authenticate with the API.
Alternatively, you can set the `TILEBOX_API_KEY` environment variable to your API key. You can then instantiate the client without passing the `token` argument. Python will automatically use this environment variable for authentication.

<CodeGroup>

Expand All @@ -82,42 +75,36 @@ without passing the `token` argument. Python automatically pick up the environme

</CodeGroup>

Tilebox datasets offers a standard synchronous API by default, but also give you the option of an async client if you need it.

The synchronous client is great for data exploration in interactive environments like Jupyter notebooks.
The asynchronous client is great for building production ready applications that need to scale. To find out more
about the differences between the two clients, check out the [Async support](/sdks/python/async) page.
<Tip>
Tilebox datasets provide a standard synchronous API by default but also offers an [asynchronous client](/sdks/python/async) if needed.
</Tip>

### Exploring datasets

Now that you have a client instance, you can start exploring the datasets that are available. An easy way to do this
is to [list all datasets](/api-reference/datasets/listing-datasets) and then using the autocomplete capability
of your IDE or inside your Jupyter notebook.
After creating a client instance, you can start exploring available datasets. A straightforward way to do this in an interactive environment is to [list all datasets](/api-reference/tilebox.datasets/Client.datasets) and use the autocomplete feature in your Jupyter notebook.

<CodeGroup>

```python Python
datasets = client.datasets()
datasets. # trigger autocomplete here to get an overview of the available datasets
datasets. # trigger autocomplete here to view available datasets
```

</CodeGroup>

<Tip>
The Console also provides an [overview](https://console.tilebox.com/datasets/explorer) of all available datasets.
</Tip>

### Errors you might encounter

#### AuthenticationError

`AuthenticationError` is raised when the client is unable to authenticate with the Tilebox API. This can happen when
the provided API key is invalid or expired. Instantiating a client with an invalid API key does not raise an error
directly, but only when you try to make a request to the API.
`AuthenticationError` occurs when the client fails to authenticate with the Tilebox API. This may happen if the provided API key is invalid or expired. A client instantiated with an invalid API key won't raise an error immediately, but an error will occur when making a request to the API.

<CodeGroup>

```python Python
client = Client(token="invalid-key") # runs without error
datasets = client.datasets() # raises AuthenticationError
```

</CodeGroup>

## Next steps
Expand Down
Loading

0 comments on commit db0f780

Please sign in to comment.