diff --git a/api-reference/datasets/accessing-dataset.mdx b/api-reference/datasets/accessing-dataset.mdx index 7a36046..a36ba37 100644 --- a/api-reference/datasets/accessing-dataset.mdx +++ b/api-reference/datasets/accessing-dataset.mdx @@ -9,12 +9,12 @@ Once you have listed all available datasets, you can access a specific dataset b ```python Python (Sync) -dataset = datasets.open_data.asf.sentinel1_sar +dataset = datasets.open_data.copernicus.sentinel1_sar # or any other dataset available to you ``` ```python Python (Async) -dataset = datasets.open_data.asf.sentinel1_sar +dataset = datasets.open_data.copernicus.sentinel1_sar # or any other dataset available to you ``` diff --git a/assets/data/example_satellite_data.nc b/assets/data/example_satellite_data.nc new file mode 100644 index 0000000..302b4a0 Binary files /dev/null and b/assets/data/example_satellite_data.nc differ diff --git a/datasets/collections.mdx b/datasets/collections.mdx index cd034b9..acaf8fd 100644 --- a/datasets/collections.mdx +++ b/datasets/collections.mdx @@ -49,13 +49,13 @@ Each dataset has a list of collections associated with it. You can list the coll ```python Python (Sync) - dataset = datasets.open_data.asf.sentinel1_sar + dataset = datasets.open_data.copernicus.landsat8_oli_tirs collections = dataset.collections() print(collections) ``` ```python Python (Async) - dataset = datasets.open_data.asf.sentinel1_sar + dataset = datasets.open_data.copernicus.landsat8_oli_tirs collections = await dataset.collections() print(collections) ``` @@ -63,8 +63,10 @@ Each dataset has a list of collections associated with it. You can list the coll ```txt Output -{'Sentinel-1A': Collection Sentinel-1A: [2014-06-15T03:44:43.000 UTC, 2022-12-31T23:57:59.000 UTC] (1209636 data points), - 'Sentinel-1B': Collection Sentinel-1B: [2016-09-26T00:02:34.000 UTC, 2021-12-23T06:53:08.000 UTC] (657674 data points)} +{'L1GT': Collection L1GT: [2013-03-25T12:08:43.699 UTC, 2024-08-19T12:57:32.456 UTC], + 'L1T': Collection L1T: [2013-03-26T09:33:19.763 UTC, 2020-08-24T03:21:50.000 UTC], + 'L1TP': Collection L1TP: [2013-03-24T00:25:55.457 UTC, 2024-08-19T12:58:20.229 UTC], + 'L2SP': Collection L2SP: [2015-01-01T07:53:35.391 UTC, 2024-08-12T12:52:03.243 UTC]} ``` The `collections` variable is a dictionary, where the keys are the names of the collections and the values are @@ -79,13 +81,13 @@ method. ```python Python (Sync) - dataset = datasets.open_data.asf.sentinel1_sar + dataset = datasets.open_data.copernicus.landsat8_oli_tirs collections = dataset.collections(availability=True, count=True) print(collections) ``` ```python Python (Async) - dataset = datasets.open_data.asf.sentinel1_sar + dataset = datasets.open_data.copernicus.landsat8_oli_tirs collections = await dataset.collections(availability=True, count=True) print(collections) ``` @@ -93,8 +95,10 @@ method. ```txt Output -{'Sentinel-1A': Collection Sentinel-1A: [2014-06-15T03:44:43.000 UTC, 2022-12-31T23:57:59.000 UTC] (1209636 data points), - 'Sentinel-1B': Collection Sentinel-1B: [2016-09-26T00:02:34.000 UTC, 2021-12-23T06:53:08.000 UTC] (657674 data points)} +{'L1GT': Collection L1GT: [2013-03-25T12:08:43.699 UTC, 2024-08-19T12:57:32.456 UTC] (154288 data points), + 'L1T': Collection L1T: [2013-03-26T09:33:19.763 UTC, 2020-08-24T03:21:50.000 UTC] (87958 data points), + 'L1TP': Collection L1TP: [2013-03-24T00:25:55.457 UTC, 2024-08-19T12:58:20.229 UTC] (322041 data points), + 'L2SP': Collection L2SP: [2015-01-01T07:53:35.391 UTC, 2024-08-12T12:52:03.243 UTC] (191110 data points)} ``` ## Accessing individual collections @@ -108,22 +112,22 @@ You can then use the `info()` method on the collection object to get information ```python Python (Sync) collections = dataset.collections() - sat1 = collections["Sat-1"] - collection_info = sat1.info(availability=True, count=True) + terrain_correction = collections["L1GT"] + collection_info = terrain_correction.info(availability=True, count=True) print(collection_info) ``` ```python Python (Async) collections = await dataset.collections() - sat1 = collections["Sat-1"] - collection_info = await sat1.info(availability=True, count=True) + terrain_correction = collections["L1GT"] + collection_info = await terrain_correction.info(availability=True, count=True) print(collection_info) ``` ```txt Output -Collection Sat-1: [2019-03-07T16:09:17.773000 UTC, 2021-05-23T19:17:23.472000 UTC] (910245 data points) +L1GT: [2013-03-25T12:08:43.699 UTC, 2024-08-19T12:57:32.456 UTC] (154288 data points) ``` You can also access a specific collection by using the `collection` method on the dataset object as well. @@ -132,21 +136,21 @@ This has the advantage that you can directly access the collection without havin ```python Python (Sync) - sat1 = dataset.collection("Sat-1") - collection_info = sat1.info(availability=True, count=True) + terrain_correction = dataset.collection("L1GT") + collection_info = terrain_correction.info(availability=True, count=True) print(collection_info) ``` ```python Python (Async) - sat1 = dataset.collection("Sat-1") - collection_info = await sat1.info(availability=True, count=True) + terrain_correction = dataset.collection("L1GT") + collection_info = await terrain_correction.info(availability=True, count=True) print(collection_info) ``` ```txt Output -Collection Sat-1: [2019-03-07T16:09:17.773000 UTC, 2021-05-23T19:17:23.472000 UTC] (910245 data points) +L1GT: [2013-03-25T12:08:43.699 UTC, 2024-08-19T12:57:32.456 UTC] (154288 data points) ``` ## Errors you may encounter diff --git a/datasets/loading-data.mdx b/datasets/loading-data.mdx index c4aae12..276b03e 100644 --- a/datasets/loading-data.mdx +++ b/datasets/loading-data.mdx @@ -25,16 +25,16 @@ assume that you have already [created a client](/datasets/introduction#creating- client = Client() datasets = client.datasets() - collections = datasets.open_data.asf.sentinel1_sar.collections() - collection = collections["Sentinel-1A"] + collections = datasets.open_data.copernicus.sentinel1_sar.collections() + collection = collections["S1A_IW_RAW__0S"] ``` ```python Python (Async) from tilebox.datasets.aio import Client client = Client() datasets = await client.datasets() - collections = await datasets.open_data.asf.sentinel1_sar.collections() - collection = collections["Sentinel-1A"] + collections = await datasets.open_data.copernicus.sentinel1_sar.collections() + collection = collections["S1A_IW_RAW__0S"] ``` @@ -57,42 +57,39 @@ Check out the example below to see how to load a data point at a specific time f ```python Python (Sync) - data = collection.load("2022-05-31 23:59:55.000") + data = collection.load("2024-08-01 00:00:01.362") print(data) ``` ```python Python (Async) - data = await collection.load("2022-05-31 23:59:55.000") + data = await collection.load("2024-08-01 00:00:01.362") print(data) ``` ```txt Output - Size: 549B -Dimensions: (time: 1, latlon: 2, n_footprint: 5) + Size: 721B +Dimensions: (time: 1, latlon: 2) Coordinates: - ingestion_time (time) datetime64[ns] 8B 2023-10-20T10:04:23 - id (time) @@ -113,12 +110,12 @@ when calling `load`. Check out the example below to see this in action. ```python Python (Sync) -data = collection.load("2022-05-31 23:59:55.000", skip_data=True) +data = collection.load("2024-08-01 00:00:01.362", skip_data=True) print(data) ``` ```python Python (Async) -data = await collection.load("2022-05-31 23:59:55.000", skip_data=True) +data = await collection.load("2024-08-01 00:00:01.362", skip_data=True) print(data) ``` @@ -128,13 +125,11 @@ print(data) Size: 160B Dimensions: (time: 1) Coordinates: - ingestion_time (time) datetime64[ns] 8B 2023-10-20T10:04:23 - id (time) ```txt Output - + Size: 0B Dimensions: () Data variables: *empty* @@ -177,8 +172,8 @@ timestamps, which would need to be manually converted again to different timezon from datetime import datetime import pytz - # Tokyo has a UTC+9 hours offset, so this is the same as 2017-01-01 02:45:35 UTC - tokyo_time = pytz.timezone('Asia/Tokyo').localize(datetime(2017, 1, 1, 11, 45, 35)) + # Tokyo has a UTC+9 hours offset, so this is the same as 2017-01-01 02:45:25.679 UTC + tokyo_time = pytz.timezone('Asia/Tokyo').localize(datetime(2017, 1, 1, 11, 45, 25, 679000)) print(tokyo_time) data = collection.load(tokyo_time) print(data) # time is in UTC since the API always returns UTC timestamps @@ -187,8 +182,8 @@ timestamps, which would need to be manually converted again to different timezon from datetime import datetime import pytz - # Tokyo has a UTC+9 hours offset, so this is the same as 2017-01-01 02:45:35 UTC - tokyo_time = pytz.timezone('Asia/Tokyo').localize(datetime(2017, 1, 1, 11, 45, 35)) + # Tokyo has a UTC+9 hours offset, so this is the same as 2017-01-01 02:45:25.679 UTC + tokyo_time = pytz.timezone('Asia/Tokyo').localize(datetime(2017, 1, 1, 11, 45, 25, 679000)) print(tokyo_time) data = await collection.load(tokyo_time) print(data) # time is in UTC since the API always returns UTC timestamps @@ -197,13 +192,14 @@ timestamps, which would need to be manually converted again to different timezon ```txt Output -2017-05-01 11:45:35+09:00 - -Dimensions: (time: 1) +2017-01-01 11:45:25.679000+09:00 + Size: 725B +Dimensions: (time: 1, latlon: 2) Coordinates: - ingestion_time (time) datetime64[ns] 2017-01-01T15:26:32 - id (time) ```txt Output - Size: 456MB -Dimensions: (time: 955942, latlon: 2, n_footprint: 5) + Size: 725MB +Dimensions: (time: 1109597, latlon: 2) Coordinates: - ingestion_time (time) datetime64[ns] 8MB 2023-10-20T09:52:37 ... 20... - id (time) @@ -390,30 +383,27 @@ Another way of specifying a time interval when loading data is to use an iterabl ```txt Output - Size: 24kB -Dimensions: (time: 50, latlon: 2, n_footprint: 5) + Size: 33kB +Dimensions: (time: 50, latlon: 2) Coordinates: - ingestion_time (time) datetime64[ns] 400B 2023-10-20T09:52:37 ... 2... - id (time) ```python Python (Sync) -datapoint_id = "01856a9e-2c08-0990-6cc7-9a860b1115a1" +datapoint_id = "01916d89-ba23-64c9-e383-3152644bcbde" datapoint = collection.find(datapoint_id) print(datapoint) ``` ```python Python (Async) -datapoint_id = "01856a9e-2c08-0990-6cc7-9a860b1115a1" +datapoint_id = "01916d89-ba23-64c9-e383-3152644bcbde" datapoint = await collection.find(datapoint_id) print(datapoint) ``` @@ -444,30 +434,27 @@ print(datapoint) ```txt Output - Size: 549B -Dimensions: (latlon: 2, n_footprint: 5) + Size: 725B +Dimensions: (latlon: 2) Coordinates: - ingestion_time datetime64[ns] 8B 2023-10-20T10:05:57 - id diff --git a/sdks/python/async.mdx b/sdks/python/async.mdx index 06962cb..5019231 100644 --- a/sdks/python/async.mdx +++ b/sdks/python/async.mdx @@ -43,11 +43,11 @@ Check out the examples below to see how that works for a few examples. datasets = client.datasets() # Listing collections -dataset = datasets.open_data.asf.sentinel1_sar +dataset = datasets.open_data.copernicus.sentinel1_sar collections = dataset.collections() # Collection information -collection = collections["Sentinel-1A"] +collection = collections["S1A_IW_RAW__0S"] info = collection.info() print(f"Data for My-collection is available for {info.availability}") @@ -55,7 +55,7 @@ print(f"Data for My-collection is available for {info.availability}") data = collection.load(("2022-05-01", "2022-06-01"), show_progress=True) # Finding a specific datapoint -datapoint_uuid = "01811c8f-0928-e6f5-df34-364cfa8a86e8" +datapoint_uuid = "01910b3c-8552-7671-3345-b902cc0813f3" datapoint = collection.find(datapoint_uuid) ``` @@ -64,11 +64,11 @@ datapoint = collection.find(datapoint_uuid) datasets = await client.datasets() # Listing collections -dataset = datasets.open_data.asf.sentinel1_sar +dataset = datasets.open_data.copernicus.sentinel1_sar collections = await dataset.collections() # Collection information -collection = collections["Sentinel-1A"] +collection = collections["S1A_IW_RAW__0S"] info = await collection.info() print(f"Data for My-collection is available for {info.availability}") @@ -76,7 +76,7 @@ print(f"Data for My-collection is available for {info.availability}") data = await collection.load(("2022-05-01", "2022-06-01"), show_progress=True) # Finding a specific datapoint -datapoint_uuid = "01811c8f-0928-e6f5-df34-364cfa8a86e8" +datapoint_uuid = "01910b3c-8552-7671-3345-b902cc0813f3" datapoint = await collection.find(datapoint_uuid) ``` @@ -110,13 +110,13 @@ from tilebox.datasets.timeseries import RemoteTimeseriesDatasetCollection # for client = Client() datasets = client.datasets() -collections = datasets.open_data.asf.sentinel1_sar.collections() +collections = datasets.open_data.copernicus.landsat8_oli_tirs.collections() def stats_for_2020(collection: RemoteTimeseriesDatasetCollection) -> None: """Fetch data for 2020 and print the number of data points that were loaded.""" data = collection.load(("2020-01-01", "2021-01-01"), show_progress=True) n = data.sizes['time'] if 'time' in data else 0 - print(f"There are {data.sizes['time']} datapoints in {collection.name} for 2020.") + print(f"There are {n} datapoints in {collection.name} for 2020.") start = time.time() @@ -139,13 +139,13 @@ from tilebox.datasets.timeseries import RemoteTimeseriesDatasetCollection # for client = Client() datasets = await client.datasets() -collections = await datasets.open_data.asf.sentinel1_sar.collections() +collections = await datasets.open_data.copernicus.landsat8_oli_tirs.collections() async def stats_for_2020(collection: RemoteTimeseriesDatasetCollection) -> None: """Fetch data for 2020 and print the number of data points that were loaded.""" data = await collection.load(("2020-01-01", "2021-01-01"), show_progress=True) n = data.sizes['time'] if 'time' in data else 0 - print(f"There are {data.sizes['time']} datapoints in {collection.name} for 2020.") + print(f"There are {n} datapoints in {collection.name} for 2020.") start = time.time() @@ -167,19 +167,19 @@ so it finishes first. ```txt Python (Sync) -Fetching data: 100% |██████████████████████████████ [00:13<00:00, 207858 datapoints, 3.91 MB/s] -There are 207858 datapoints in Sentinel-1A for 2020. -Fetching data: 100% |██████████████████████████████ [00:11<00:00, 179665 datapoints, 4.39 MB/s] -There are 179665 datapoints in Sentinel-1B for 2020. -Fetching data took 25.34 seconds +There are 19624 datapoints in L1GT for 2020. +There are 1281 datapoints in L1T for 2020. +There are 65313 datapoints in L1TP for 2020. +There are 25375 datapoints in L2SP for 2020. +Fetching data took 10.92 seconds ``` ```txt Python (Async) -Fetching data: 100% |██████████████████████████████ [00:19<00:00, 207858 datapoints, 2.21 MB/s] -Fetching data: 100% |██████████████████████████████ [00:17<00:00, 179665 datapoints, 2.94 MB/s] -There are 179665 datapoints in Sentinel-1B for 2020. -There are 207858 datapoints in Sentinel-1A for 2020. -Fetching data took 20.12 seconds +There are 1281 datapoints in L1T for 2020. +There are 19624 datapoints in L1GT for 2020. +There are 25375 datapoints in L2SP for 2020. +There are 65313 datapoints in L1TP for 2020. +Fetching data took 7.45 seconds ``` diff --git a/sdks/python/xarray.mdx b/sdks/python/xarray.mdx index d487c44..c71b526 100644 --- a/sdks/python/xarray.mdx +++ b/sdks/python/xarray.mdx @@ -47,7 +47,7 @@ number of benefits compared to custom Tilebox specific data structures such as: ## An example dataset -To get an understanding of how Xarray works, a simple example dataset is used, as it could be returned by a +To get an understanding of how Xarray works, a sample dataset is used, as it could be returned by a [Tilebox timeseries dataset](/datasets/timeseries). @@ -57,7 +57,7 @@ from tilebox.datasets import Client client = Client() datasets = client.datasets() -collection = datasets.open_data.asf.sentinel1_sar.collection("Sentinel-1A") +collection = datasets.open_data.copernicus.landsat8_oli_tirs.collection("L1GT") satellite_data = collection.load(("2022-05-01", "2022-06-01"), show_progress=True) print(satellite_data) ``` @@ -67,7 +67,7 @@ from tilebox.datasets.aio import Client client = Client() datasets = await client.datasets() -collection = datasets.open_data.asf.sentinel1_sar.collection("Sentinel-1A") +collection = datasets.open_data.copernicus.landsat8_oli_tirs.collection("L1GT") satellite_data = await collection.load(("2022-05-01", "2022-06-01"), show_progress=True) print(satellite_data) ``` @@ -75,52 +75,49 @@ print(satellite_data) ```txt Output - Size: 8MB -Dimensions: (time: 16507, latlon: 2, n_footprint: 5) + Size: 305kB +Dimensions: (time: 514, latlon: 2) Coordinates: - ingestion_time (time) datetime64[ns] 132kB 2023-10-20T10:04:07 ... ... - id (time) This is a simple dataset that was generated to showcase some common Xarray use-cases. If you want to follow along, you - can download the dataset as a NetCDF file. The [Reading and writing - files section](/sdks/python/xarray#reading-and-writing-files) explains how to save and load Xarray datasets to and - from NetCDF files. + can [download the dataset as a NetCDF file](/assets/data/example_satellite_data.nc). The [Reading and writing files + section](/sdks/python/xarray#reading-and-writing-files) explains how to save and load Xarray datasets to and from + NetCDF files. Here is a breakdown of the preceding output: - `satellite_data` **dataset** contains different **dimensions**, **coordinates** and **variables** -- `time` **dimension** consists of 570396 elements. This means there are 570396 data points in the dataset +- `time` **dimension** consists of 514 elements. This means there are 514 data points in the dataset - `time` **dimension coordinate** contains datetime values. This is the time when the data was measured. The `*` mark shows that it's a dimension coordinate. Dimension coordinates are used for label based indexing and alignment, it means you can use the time to access individual data points in the dataset - `ingestion_time` **non-dimension coordinate** contains datetime values. This is the time when the data was ingested into the Tilebox database. Non-dimension coordinates are variables that contain coordinate data, but are not - used for label based indexing and alignment. They can [even be multidimensional](https://docs.xarray.dev/en/stable/examples/multidimensional-coords.html). -- `sensor` **variable** contains integers. This variable tells you which sensor produced a given measurement. - A sensor in this case is identified by a number, `1` or `2` in the example dataset -- `measurement` **variable** contains floating point values. This variable contains the actual measurement values. + used for label based indexing and alignment. They can [even be multidimensional](https://docs.xarray.dev/en/stable/examples/multidimensional-coords.html) +- The dataset contains 28 **variables** +- `bands` **variable** contains integers, this variable tells you how many bands the data contains +- `sun_elevation` **variable** contains floating point values, this variable contains the sun elevation when the data was measured Check out the [xarray terminology overview](https://docs.xarray.dev/en/stable/user-guide/terminology.html) to deepen @@ -137,30 +134,33 @@ no more API requests are required, there is no difference between the sync and a There a couple of different ways that you can access data in a dataset. The Xarray documentation provides a [great overview](https://docs.xarray.dev/en/stable/user-guide/indexing.html) of all those methods. -You can access the `measurement` variable: +You can access the `sun_elevation` variable: ```python Accessing values -# Let's print the first measurement value -print(satellite_data.measurement[0]) +# Let's print the first sun elevation value +print(satellite_data.sun_elevation[0]) ``` ```txt Output - array(3.07027067) Coordinates: -ingestion_time datetime64[ns] 2017-01-01T15:26:32 time datetime64[ns] -2017-01-01T02:45:35 + Size: 8B +array(44.19904463) +Coordinates: + ingestion_time datetime64[ns] 8B 2024-07-22T09:06:43.558629 + id - Dimensions: () Coordinates: ingestion_time datetime64[ns] 2017-01-01T15:26:32 - time datetime64[ns] 2017-01-01T02:45:35 Data variables: sensor int64 2 - measurement float64 3.07 - + Size: 665B +Dimensions: (latlon: 2) +Coordinates: + ingestion_time datetime64[ns] 8B 2024-07-22T09:06:43.558629 + id - Dimensions: (time: 3) Coordinates: ingestion_time (time) datetime64[ns] - 2022-12-31T20:56:40 ... 2022-12-31T... * time (time) datetime64[ns] - 2022-12-31T15:47:54 ... 2022-12-31T... Data variables: sensor (time) int64 1 2 - 1 measurement (time) float64 1.491 2.045 2.798 - +First 3 sun_elevations [44.19904463 57.77561083 58.76316786] +Last 3 sun_elevations [55.60690523 56.72453179 57.81917624] +Sub dataset of the last 3 datapoints + Size: 2kB +Dimensions: (time: 3, latlon: 2) +Coordinates: + ingestion_time (time) datetime64[ns] 24B 2024-07-22T09:08:24.7395... + id (time) array([3.58839564e+00, -2.70314237e+00, 3.27767130e-03, ..., 2.83278085e+00, 1.49074120e+00, -2.79836407e+00]) Coordinates: ingestion_time (time) datetime64[ns] -2017-01-01T15:26:32 ... 2022-12-31T... * time (time) datetime64[ns] -2017-01-01T02:54:03 ... 2022-12-31T... + Size: 216B +array([63.89629314, 63.35038654, 64.10330149, 64.11904038, 64.32007459, + 65.00696561, 60.81739662, 65.72788105, 65.90881403, 65.90881403, + 66.51835574, 66.51835574, 61.24068875, 66.34420723, 66.34420723, + 65.07319907, 65.07319907, 67.19808628, 67.19808628, 67.69088228, + 61.54950615, 67.76723723, 67.76723723, 68.23219829, 68.23219829, + 64.37400345, 64.37400345]) +Coordinates: + ingestion_time (time) datetime64[ns] 216B 2024-07-22T09:06:43.558629 ...... + id (time) 1.5) & - (satellite_data.measurement < 1.6) + (satellite_data.cloud_cover == 0) & + (satellite_data.sun_elevation > 45) & + (satellite_data.sun_elevation < 90) ) -filtered_measurements = satellite_data.measurement[data_filter] -print(filtered_measurements) +filtered_sun_elevations = satellite_data.sun_elevation[data_filter] +print(filtered_sun_elevations) ``` ```txt Output - array([1.54675131, 1.58851704, -1.52978976, ..., 1.54684979, 1.58256101, 1.5325089 ]) Coordinates: -ingestion_time (time) datetime64[ns] 2017-01-01T05:21:17 ... 2022-12-31T... * -time (time) datetime64[ns] 2017-01-01T18:17:47 ... 2022-12-31T... + Size: 216B +array([63.89629314, 63.35038654, 64.10330149, 64.11904038, 64.32007459, + 65.00696561, 60.81739662, 65.72788105, 65.90881403, 65.90881403, + 66.51835574, 66.51835574, 61.24068875, 66.34420723, 66.34420723, + 65.07319907, 65.07319907, 67.19808628, 67.19808628, 67.69088228, + 61.54950615, 67.76723723, 67.76723723, 68.23219829, 68.23219829, + 64.37400345, 64.37400345]) +Coordinates: + ingestion_time (time) datetime64[ns] 216B 2024-07-22T09:06:43.558629 ...... + id (time) - Dimensions: () Coordinates: ingestion_time datetime64[ns] 2020-12-27T18:30:47 - time datetime64[ns] 2021-01-14T07:21:04 Data variables: sensor int64 1 - measurement float64 3.873 - + Size: 665B +Dimensions: (latlon: 2) +Coordinates: + ingestion_time datetime64[ns] 8B 2024-07-22T09:06:43.558629 + id >> raises KeyError: "2021-01-14T07:21:05" +nearest_datapoint = satellite_data.sel(time="2022-05-01T11:28:28.000000") +>>> raises KeyError: "2022-05-01T11:28:28.000000" ``` The `method` parameter can be used to return the closest value instead of raising an error. -```python Finding the closest measurement -nearest_measurement = satellite_data.sel(time="2021-01-14T07:21:05", method="nearest") -assert nearest_measurement.equals(specific_measurement) # passes +```python Finding the closest data point +nearest_datapoint = satellite_data.sel(time="2022-05-01T11:28:28.000000", method="nearest") +assert nearest_datapoint.equals(specific_datapoint) # passes ``` @@ -306,29 +366,29 @@ Xarray and NumPy offer a wide range of statistical functions that can be applied a few examples: ```python Computing dataset statistics -measurements = satellite_data.measurement -min_meas = measurements.min().item() -max_meas = measurements.max().item() -mean_meas = measurements.mean().item() -std_meas = measurements.std().item() -print(f"Measurements from {min_meas:.2f} to {max_meas:.2f} with mean {mean_meas:.2f} and a std of {std_meas:.2f}") +cloud_cover = satellite_data.cloud_cover +min_meas = cloud_cover.min().item() +max_meas = cloud_cover.max().item() +mean_meas = cloud_cover.mean().item() +std_meas = cloud_cover.std().item() +print(f"Cloud cover from {min_meas:.2f} to {max_meas:.2f} with mean {mean_meas:.2f} and a std of {std_meas:.2f}") ``` ```txt Output -Measurements from 0.00 to 4.00 with mean 1.91 and a std of 1.44 +Cloud cover from 0.00 to 100.00 with mean 76.48 and a std of 34.17 ``` -You can also use many NumPy functions directly on a dataset or DataArray. For example, to find out which sensors -you are dealing with, you can use [np.unique](https://numpy.org/doc/stable/reference/generated/numpy.unique.html) to -get all the unique values in the `sensor` data array. +You can also use many NumPy functions directly on a dataset or DataArray. For example, to find out how many bands +the data contains, you can use [np.unique](https://numpy.org/doc/stable/reference/generated/numpy.unique.html) to +get all the unique values in the `bands` data array. ```python Finding unique values import numpy as np -print("Sensors:", np.unique(satellite_data.sensor)) +print("Sensors:", np.unique(satellite_data.bands)) ``` ```txt Output -Sensors: [1 2] +Sensors: [12] ``` ## Reading and writing files @@ -338,6 +398,8 @@ to share your data with others or if you want to persist your data for later use formats, including NetCDF, Zarr, GRIB, and many more. For a full list of supported formats, please refer to the [official documentation page](https://docs.xarray.dev/en/stable/user-guide/io.html). +You might need to install the `netcdf4` package first. You can do this by running `pip install netcdf4`. + Here is how you can save the example dataset to a NetCDF file: ```python Saving a dataset to a file @@ -354,7 +416,7 @@ satellite_data = xr.open_dataset("example_satellite_data.nc") ``` In case you want to follow along with the examples in this section, you can download the example dataset as a NetCDF -file here. +file [here](/assets/data/example_satellite_data.nc). ## Further reading