Skip to content

Commit

Permalink
Update benchmark scripts and add zarr-python
Browse files Browse the repository at this point in the history
  • Loading branch information
LDeakin committed Jun 15, 2024
1 parent 2e255d5 commit ab8bf29
Show file tree
Hide file tree
Showing 5 changed files with 176 additions and 110 deletions.
97 changes: 50 additions & 47 deletions docs/benchmarks.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,6 @@

# Benchmarks

> [!CAUTION]
> Take these benchmarks with a grain of salt, they need to be reviewed.
> * The `zarrs_benchmark_read` and `zarrs_benchmark_read_async` binaries have been optimised to be as efficient as possible with the `zarrs` API.
> * The `tensorstore` benchmark script may not be using the optimal tensorstore API, might not be doing async properly, and may not be equivalent to the zarrs benchmark.
> * Tensorstore benchmarks use the python rather than the C++ API and are subject to the overheads of python.
## Benchmark Data
Benchmark data is generated with `scripts/generate_benchmark_array.py` as follows
```bash
Expand All @@ -31,58 +25,67 @@ Benchmark data is generated with `scripts/generate_benchmark_array.py` as follow
- AMD Ryzen 5900X
- 64GB DDR4 3600MHz (16-19-19-39)
- 2TB Samsung 990 Pro
- Ubuntu 22.04 (in Windows 11 WSL2, swap disabled, 24GB available memory)
- Rust 1.76.0 (07dca489a 2024-02-04)
- Ubuntu 22.04 (in Windows 11 WSL2, swap disabled, 32GB available memory)

## Implementations Benchmarked
- [`LDeakin/zarrs`](https://github.com/LDeakin/zarrs) v0.14 (Rust 1.79.0) via [`LDeakin/zarrs_tools`](https://github.com/LDeakin/zarrs_tools) 0.4.2
- Benchmark executable: [zarrs_benchmark_read_sync](https://github.com/LDeakin/zarrs_tools/blob/main/src/bin/zarrs_benchmark_read_sync.rs)
- ~~Benchmark executable: [zarrs_benchmark_read_async](https://github.com/LDeakin/zarrs_tools/blob/main/src/bin/zarrs_benchmark_read_sync.rs)~~
- [`google/tensorstore`](https://github.com/google/tensorstore) v0.1.61 (Python 3.12.3)
- Benchmark script: <https://github.com/LDeakin/zarrs_tools/blob/main/scripts/tensorstore_python_benchmark_read_async.py>
- [`zarr-developers/zarr-python`](https://github.com/zarr-developers/zarr-python) 3.0.0a0 (Python 3.12.3)
- Benchmark script: <https://github.com/LDeakin/zarrs_tools/blob/main/scripts/zarr_python_benchmark_read_async.py>

## Implementation Versions Benchmarked
- zarrs_tools v0.3.0 (prerelease) installed with `RUSTFLAGS="-C target-cpu=native" cargo install --all-features --path .`
- tensorstore v0.1.53 installed with `pip install tensorstore`
> [!CAUTION]
> Python benchmarks are subject to the overheads of Python and may not be using an optimal API for each zarr implementation.
## Read Benchmarks

## Comparative Benchmarks
### Entire Array
This benchmark measures the time and maximum memory used to read an entire dataset into memory.
- These are best of 3 measurements
- The disk cache is cleared between each measurement

### Read Entire Array
```bash
python3 ./scripts/run_benchmark_read_all.py
```

> [!NOTE]
> Rather than simply calling a single retrieve method like `async_retrieve_array_subset`, the zarrs async benchmark uses a ***complicated*** alternative routine.
>
> This is necessary to achieve decent performance with many chunks because the zarrs async API is unable to parallelise across chunks.
> See <https://docs.rs/zarrs/latest/zarrs/array/struct.Array.html#async-api>.
| Image | Wall time (s)<br>zarrs<br>sync | <br><br>async | <br>tensorstore<br>async | Memory usage (GB)<br>zarrs<br>sync | <br><br>async | <br>tensorstore<br>async |
|:-----------------------------------|---------------------------------:|----------------:|---------------------------:|-------------------------------------:|----------------:|---------------------------:|
| data/benchmark.zarr | 3.03 | 9.27 | 3.23 | 8.42 | 8.41 | 8.58 |
| data/benchmark_compress.zarr | 2.84 | 8.45 | 2.68 | 8.44 | 8.43 | 8.53 |
| data/benchmark_compress_shard.zarr | 1.62 | 1.83 | 2.58 | 8.63 | 8.73 | 8.57 |
| Image | Time (s)<br>zarrs<br>rust | <br>tensorstore<br>python | <br>zarr<br>python | Memory (GB)<br>zarrs<br>rust | <br>tensorstore<br>python | <br>zarr<br>python |
|:-----------------------------------|----------------------------:|----------------------------:|---------------------:|-------------------------------:|----------------------------:|---------------------:|
| data/benchmark.zarr | 2.95 | 3.17 | 51.53 | 8.42 | 8.59 | 15.28 |
| data/benchmark_compress.zarr | 3 | 2.83 | 74.82 | 8.44 | 8.53 | 19.14 |
| data/benchmark_compress_shard.zarr | 1.47 | 2.18 | 36.37 | 8.63 | 8.94 | 27.42 |

These are best of 3 measurements.
### Chunk-By-Chunk
This benchmark measures the time to read a dataset chunk-by-chunk into memory.
- These are best of 1 measurements
- The disk cache is cleared between each measurement
- TODO: Need to review scripts for tensorstore/zarr-python, performance is not improving much with concurrency

### Read Chunk-By-Chunk
```bash
python3 ./scripts/run_benchmark_read_chunks.py
```

| Image | Concurrency | Wall time (s)<br>zarrs<br>sync | <br><br>async | <br>tensorstore<br>async | Memory usage (GB)<br>zarrs<br>sync | <br><br>async | <br>tensorstore<br>async |
|:-----------------------------------|--------------:|---------------------------------:|----------------:|---------------------------:|-------------------------------------:|----------------:|---------------------------:|
| data/benchmark.zarr | 1 | 25.23 | 55.17 | 52.57 | 0.03 | 0.01 | 0.51 |
| data/benchmark.zarr | 2 | 14.45 | 32.84 | 30.98 | 0.03 | 0.01 | 0.52 |
| data/benchmark.zarr | 4 | 7.87 | 18.28 | 23.71 | 0.03 | 0.01 | 0.51 |
| data/benchmark.zarr | 8 | 4.32 | 10.67 | 20.98 | 0.03 | 0.02 | 0.52 |
| data/benchmark.zarr | 16 | 2.71 | 8.03 | 19.39 | 0.03 | 0.02 | 0.52 |
| data/benchmark.zarr | 32 | 2.52 | 8.22 | 18.58 | 0.03 | 0.03 | 0.53 |
| data/benchmark_compress.zarr | 1 | 20.78 | 36.4 | 46.78 | 0.03 | 0.02 | 0.51 |
| data/benchmark_compress.zarr | 2 | 12.47 | 19.71 | 27.16 | 0.03 | 0.02 | 0.52 |
| data/benchmark_compress.zarr | 4 | 7.11 | 11.06 | 22.32 | 0.03 | 0.02 | 0.51 |
| data/benchmark_compress.zarr | 8 | 3.82 | 7.29 | 20.01 | 0.03 | 0.03 | 0.52 |
| data/benchmark_compress.zarr | 16 | 2.22 | 7.09 | 18.72 | 0.04 | 0.04 | 0.54 |
| data/benchmark_compress.zarr | 32 | 2.18 | 6.82 | 17.72 | 0.04 | 0.07 | 0.54 |
| data/benchmark_compress_shard.zarr | 1 | 2.59 | 2.63 | 2.71 | 0.37 | 0.4 | 0.42 |
| data/benchmark_compress_shard.zarr | 2 | 1.76 | 1.77 | 2.31 | 0.7 | 0.76 | 0.56 |
| data/benchmark_compress_shard.zarr | 4 | 1.48 | 1.46 | 2.31 | 1.29 | 1.24 | 1.05 |
| data/benchmark_compress_shard.zarr | 8 | 1.41 | 1.47 | 2.57 | 2.37 | 2.29 | 1.41 |
| data/benchmark_compress_shard.zarr | 16 | 1.57 | 1.56 | 2.85 | 4.34 | 3.99 | 2.13 |
| data/benchmark_compress_shard.zarr | 32 | 1.54 | 1.76 | 3.15 | 6.54 | 6.9 | 3.46
| Image | Concurrency | Time (s)<br>zarrs<br>rust | Memory (GB)<br>zarrs<br>rust |
|:-----------------------------------|--------------:|----------------------------:|-------------------------------:|
| data/benchmark.zarr | 1 | 27.12 | 0.03 |
| data/benchmark.zarr | 2 | 15.15 | 0.03 |
| data/benchmark.zarr | 4 | 8.58 | 0.02 |
| data/benchmark.zarr | 8 | 4.74 | 0.03 |
| data/benchmark.zarr | 16 | 2.84 | 0.02 |
| data/benchmark.zarr | 32 | 2.8 | 0.02 |
| data/benchmark_compress.zarr | 1 | 22.15 | 0.02 |
| data/benchmark_compress.zarr | 2 | 13.47 | 0.03 |
| data/benchmark_compress.zarr | 4 | 7.68 | 0.03 |
| data/benchmark_compress.zarr | 8 | 4.16 | 0.03 |
| data/benchmark_compress.zarr | 16 | 2.44 | 0.03 |
| data/benchmark_compress.zarr | 32 | 2.42 | 0.04 |
| data/benchmark_compress_shard.zarr | 1 | 2.53 | 0.36 |
| data/benchmark_compress_shard.zarr | 2 | 1.58 | 0.7 |
| data/benchmark_compress_shard.zarr | 4 | 1.42 | 1.29 |
| data/benchmark_compress_shard.zarr | 8 | 1.5 | 2.21 |
| data/benchmark_compress_shard.zarr | 16 | 1.38 | 4.46 |
| data/benchmark_compress_shard.zarr | 32 | 1.5 | 6.69 |

These are best of 1 measurements.
## Round Trip Benchmarks
TODO
49 changes: 24 additions & 25 deletions scripts/run_benchmark_read_all.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,31 +3,36 @@
import subprocess
import re
import pandas as pd
import numpy as np
import math
import numpy as np

def clear_cache():
subprocess.call(['sudo', 'sh', '-c', "sync; echo 3 > /proc/sys/vm/drop_caches"])

implementation_to_args = {
"zarrs_sync": ["/usr/bin/time", "-v", "zarrs_benchmark_read_sync", "--read-all"],
"zarrs_async": ["/usr/bin/time", "-v", "zarrs_benchmark_read_async", "--read-all"],
"tensorstore": ["/usr/bin/time", "-v", "./scripts/tensorstore_benchmark_read_async.py", "--read_all"],
"zarrs_rust": ["/usr/bin/time", "-v", "zarrs_benchmark_read_sync", "--read-all"],
# "zarrs_rust_async": ["/usr/bin/time", "-v", "zarrs_benchmark_read_async", "--read-all"],
"tensorstore_python": ["/usr/bin/time", "-v", "./scripts/tensorstore_python_benchmark_read_async.py", "--read_all"],
"zarr_python": ["/usr/bin/time", "-v", "./scripts/zarr_python_benchmark_read_async.py", "--read_all"],
}

def clear_cache():
subprocess.call(['sudo', 'sh', '-c', "sync; echo 3 > /proc/sys/vm/drop_caches"])
implementations = ["zarrs_rust", "tensorstore_python", "zarr_python"]

images = [
"data/benchmark.zarr",
"data/benchmark_compress.zarr",
"data/benchmark_compress_shard.zarr",
]

best_of = 3

index = []
rows = []
for image in [
"data/benchmark.zarr",
"data/benchmark_compress.zarr",
"data/benchmark_compress_shard.zarr",
]:
for image in images:
index.append(image)
wall_times = []
memory_usages = []
for implementation in ["zarrs_sync", "zarrs_async", "tensorstore"]:
for implementation in implementations:
wall_time_measurements = []
memory_usage_measurements = []
for i in range(best_of):
Expand All @@ -49,10 +54,9 @@ def clear_cache():
m = int(wall_time.group(1))
s = float(wall_time.group(2))
wall_time_s = m * 60 + s
# print(wall_time_s)
memory_usage_kb = int(memory_usage.group(1))
memory_usage_gb = float(memory_usage_kb) / 1.0e6
# print(memory_usage_gb)
print(wall_time_s, memory_usage_gb)
wall_time_measurements.append(wall_time_s)
memory_usage_measurements.append(memory_usage_gb)
else:
Expand All @@ -67,13 +71,12 @@ def clear_cache():
row = wall_times + memory_usages
rows.append(row)


columns_pandas = []
columns_markdown = []
for metric in ["Wall time (s)", "Memory usage (GB)"]:
for metric in ["Time (s)", "Memory (GB)"]:
include_metric = True
last_implementation = ""
for implementation, execution in [("zarrs", "sync"), ("zarrs", "async"), ("tensorstore", "async")]:
for implementation in implementations:
column_markdown = ""

# Metric
Expand All @@ -82,24 +85,20 @@ def clear_cache():
column_markdown += "<br>"
include_metric = False

# Implemnentation
# Implementation
if implementation != last_implementation:
last_implementation = implementation
column_markdown += implementation
column_markdown += "<br>"

# Execution
column_markdown += execution
column_markdown += implementation.replace("_", "<br>")

columns_markdown.append(column_markdown)
columns_pandas.append((metric, implementation, execution))
columns_pandas.append((metric, implementation))

data = {
"index": index,
"columns": columns_pandas,
"data": rows,
"index_names": ["Image"],
"column_names": ["Metric", "Implementation", "Execution"],
"column_names": ["Metric", "Implementation"],
}

# print(data)
Expand Down
Loading

0 comments on commit ab8bf29

Please sign in to comment.