Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and Simplify Documentation #395

Merged
merged 10 commits into from
May 23, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
[deps]
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
Bonito = "824d6782-a2ef-11e9-3a09-e5662e0c26f8"
CFTime = "179af706-886a-5703-950a-314cd64e0468"
CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
Expand Down
22 changes: 13 additions & 9 deletions docs/src/.vitepress/config.mts
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,19 @@ export default defineConfig({
},
nav: [
{ text: 'Home', link: '/' },
{ text: 'Getting Started', link: '/getting_started' },
{ text: 'Get Started', link: '/get_started' },
{
text: 'User Guide',
items: [
{ text: 'Read and Write', link: '/UserGuide/read_and_write' },
{ text: 'Read', link: '/UserGuide/read' },
{ text: 'Create', link: '/UserGuide/create' },
{ text: 'Write', link: '/UserGuide/write' },
{ text: 'Subset', link: '/UserGuide/subset' },
{ text: 'Compute', link: '/UserGuide/compute' },
{ text: 'FAQ', link: '/UserGuide/faq' },
{ text: 'Group', link: '/UserGuide/group' },
{ text: 'Combine', link: '/UserGuide/combine' },
{ text: 'Chunk', link: '/UserGuide/chunk' },
{ text: 'FAQ', link: '/UserGuide/faq' }
]
},
{
Expand Down Expand Up @@ -72,16 +78,19 @@ export default defineConfig({
],

sidebar: [
{ text: 'Getting Started', link: '/getting_started' },
{ text: 'Get Started', link: '/get_started' },
{ text: 'API Reference', link: 'api' },
{
text: 'User Guide',
items: [
{ text: 'Types', link: '/UserGuide/types' },
{ text: 'Create', link: '/UserGuide/create' },
{ text: 'Read', link: '/UserGuide/read' },
{ text: 'Write', link: '/UserGuide/write' },
{ text: 'Subset', link: '/UserGuide/subset' },
{ text: 'Compute', link: '/UserGuide/compute' },
{ text: 'Group', link: '/UserGuide/group' },
{ text: 'Combine', link: '/UserGuide/combine' },
{ text: 'Chunk', link: '/UserGuide/chunk' },
{ text: 'FAQ', link: '/UserGuide/faq' }
]
Expand All @@ -100,11 +109,6 @@ export default defineConfig({
{ text: 'Contribute', link: 'development/contribute' },
{ text: 'Contributors', link: 'development/contributors' }
]
}, {
text: 'API',
items: [
{ text: 'API Reference', link: 'api' },
]
},
],
editLink: {
Expand Down
91 changes: 90 additions & 1 deletion docs/src/UserGuide/chunk.md
Original file line number Diff line number Diff line change
@@ -1 +1,90 @@
# Chunk YAXArrays
# Chunk YAXArrays

> [!IMPORTANT]
> Thinking about chunking is important when it comes to analyzing your data, because in most situations this will not fit into memory, hence having the fastest read access to it is crucial for your workflows. For example, for geo-spatial data do you want fast access on time or space, or... think about it.

To determine the chunk size of the array representation on disk,
call the `setchunks` function prior to saving.

## Chunking YAXArrays

````@example chunks
using YAXArrays, Zarr
a = YAXArray(rand(10,20))
a_chunked = setchunks(a, (5,10))
a_chunked.chunks
````
And the saved file is also splitted into Chunks.

````@example chunks
f = tempname()
savecube(a_chunked, f, backend=:zarr)
Cube(f).chunks
````

Alternatively chunk sizes can be given by dimension name, so the following results in the same chunks:

````@example chunks
a_chunked = setchunks(a, (Dim_2=10, Dim_1=5))
a_chunked.chunks
````

## Chunking Datasets
Setchunks can also be applied to a `Dataset`.

### Set Chunks by Axis

Set chunk size for each axis occuring in a `Dataset`. This will be applied to all variables in the dataset:

````@example chunks
using YAXArrays, Zarr
ds = Dataset(x = YAXArray(rand(10,20)), y = YAXArray(rand(10)), z = YAXArray(rand(10,20,5)))
dschunked = setchunks(ds, Dict("Dim_1"=>5, "Dim_2"=>10, "Dim_3"=>2))
Cube(dschunked).chunks
````

Saving...

````@example chunks
f = tempname()
savedataset(dschunked, path=f, driver=:zarr)
````

### Set chunking by Variable

The following will set the chunk size for each Variable separately
and results in exactly the same chunking as the example above

````@example chunks
using YAXArrays, Zarr
ds = Dataset(x = YAXArray(rand(10,20)), y = YAXArray(rand(10)), z = YAXArray(rand(10,20,5)))
dschunked = setchunks(ds,(x = (5,10), y = Dict("Dim_1"=>5), z = (Dim_1 = 5, Dim_2 = 10, Dim_3 = 2)))
Cube(dschunked).chunks
````

saving...

````@example chunks
f = tempname()
savedataset(dschunked, path=f, driver=:zarr)
````

### Set chunking for all variables

The following code snippet only works when all member variables of the dataset have the same shape and sets the output chunks for all arrays.

````@example chunks
using YAXArrays, Zarr
ds = Dataset(x = YAXArray(rand(10,20)), y = YAXArray(rand(10,20)), z = YAXArray(rand(10,20)))
dschunked = setchunks(ds,(5,10))
Cube(dschunked).chunks
````

saving...

````@example chunks
f = tempname()
savedataset(dschunked, path=f, driver=:zarr)
````

Suggestions on how to improve or add to these examples is welcome.
33 changes: 33 additions & 0 deletions docs/src/UserGuide/combine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Combine YAXArrays

Data is often scattered across multiple files and corresponding arrays, e.g. one file per time step.
This section describes methods on how to combine them into a single YAXArray.

## Concatenate YAXArrays along an existing dimension

Here we use `cat` to combine two arrays consisting of data from the first and the second half of a year into one single array containing the whole year.
We glue the arrays along the first dimension using `dims = 1`:
The resulting array `whole_year` still has one dimension, i.e. time, but with 12 instead of 6 elements.

````@example cat
using YAXArrays

first_half = YAXArray((Dim{:time}(1:6),), rand(6))
second_half = YAXArray((Dim{:time}(7:12),), rand(6))
whole_year = cat(first_half, second_half, dims = 1)
````

## Combine YAXArrays along a new dimension

Here we use `concatenatecubes` to combine two arrays of different variables that share the same time dimension.
The resulting array `combined` has an additional dimension `variable` indicating from which array the element values originates.

````@example concatenatecubes
using YAXArrays

temperature = YAXArray((Dim{:time}(1:6),), rand(6))
precipitation = YAXArray((Dim{:time}(1:6),), rand(6))
cubes = [temperature,precipitation]
var_axis = Dim{:variable}(["temp", "prep"])
combined = concatenatecubes(cubes, var_axis)
````
88 changes: 87 additions & 1 deletion docs/src/UserGuide/compute.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,102 @@
# Compute YAXArrays

This section describes how to create new YAXArrays by performing arithmetic operations.
This section describes how to create new YAXArrays by performing operations on them.

- Use [arithmetics](#Arithmetics) to add or multiply numbers to each element of an array
- Use [map](#map) to apply a more complex functions to every element of an array
- Use [mapslices](#mapslices) to reduce a dimension, e.g. to get the mean over all time steps
- Use [mapCube](#mapCube) to apply complex functions on an array that may change any dimensions


Let's start by creating an example dataset:

````@example compute
using YAXArrays
using Dates

axlist = (
Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")),
Dim{:lon}(range(1, 10, length=10)),
Dim{:lat}(range(1, 5, length=15)),
)
data = rand(30, 10, 15)
properties = Dict(:origin => "user guide")
a = YAXArray(axlist, data2, properties)
````

## Modify elements of a YAXArray

````@example compute
a[1,2,3]
````

````@example compute
a[1,2,3] = 42
````

````@example compute
a[1,2,3]
````

::: warning

Some arrays, e.g. those saved in a cloud object storage are immutable making any modification of the data impossible.

:::


## Arithmetics

Add a value to all elements of an array and save it as a new array:

````@example compute
a2 = a .+ 5
````

````@example compute
a2[1,2,3] == a[1,2,3] + 5
````

## map

Apply a function on every element of an array individually:

````@example compute
offset = 5
map(a) do x
(x + offset) / 2 * 3
end
````

This keeps all dimensions unchanged.
Note, that here we can not access neighboring elements.
In this case, we can use `mapslices` or `mapCube` instead.
Each element of the array is processed individually.

The code runs very fast, because `map` applies the function lazily.
Actual computation will be performed only on demand, e.g. when elements were explicitly requested or further computations were performed.


## mapslices

Reduce the time dimension by calculating the average value of all time points:

````@example compute
import Statistics: mean
mapslices(mean, a, dims="Time")
````
There is no time dimension left, because there is only one value left after averaging all time steps.
We can also calculate spatial means resulting in one value per time step:

````@example compute
import Statistics: mean
mapslices(mean, a, dims=("lat", "lon"))
````

## mapCube



## Distributed Computation

parallel
35 changes: 35 additions & 0 deletions docs/src/UserGuide/create.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Create YAXArrays and Datasets

## Create a YAXArray

We can create a new YAXArray by filling the values directly:

````@example create
using YAXArrays
a1 = YAXArray(rand(10, 20, 5))
````

We can also specify the dimensions with custom names enabling easier access:

````@example create
using Dates

axlist = (
Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")),
Dim{:lon}(range(1, 10, length=10)),
Dim{:lat}(range(1, 5, length=15)),
)
data2 = rand(30, 10, 15)
properties = Dict(:origin => "user guide")
a2 = YAXArray(axlist, data2, properties)
````

## Create a Dataset

````@example create
data3 = rand(30, 10, 15)
a3 = YAXArray(axlist, data3, properties)

arrays = Dict(:a2 => a2, :a3 => a3)
ds = Dataset(; properties, arrays...)
````
Loading
Loading