Update benchmark scripts and add zarr-python

LDeakin · Jun 15, 2024 · ab8bf29 · ab8bf29
1 parent 2e255d5
commit ab8bf29
Show file tree

Hide file tree

Showing 5 changed files with 176 additions and 110 deletions.
diff --git a/docs/benchmarks.md b/docs/benchmarks.md
@@ -1,12 +1,6 @@
 
 # Benchmarks
 
-> [!CAUTION]
-> Take these benchmarks with a grain of salt, they need to be reviewed.
-> * The `zarrs_benchmark_read` and `zarrs_benchmark_read_async` binaries have been optimised to be as efficient as possible with the `zarrs` API.
-> * The `tensorstore` benchmark script may not be using the optimal tensorstore API, might not be doing async properly, and may not be equivalent to the zarrs benchmark.
-> * Tensorstore benchmarks use the python rather than the C++ API and are subject to the overheads of python.
-
 ## Benchmark Data
 Benchmark data is generated with `scripts/generate_benchmark_array.py` as follows
 ```bash
@@ -31,58 +25,67 @@ Benchmark data is generated with `scripts/generate_benchmark_array.py` as follow
 - AMD Ryzen 5900X
 - 64GB DDR4 3600MHz (16-19-19-39)
 - 2TB Samsung 990 Pro
-- Ubuntu 22.04 (in Windows 11 WSL2, swap disabled, 24GB available memory)
-- Rust 1.76.0 (07dca489a 2024-02-04)
+- Ubuntu 22.04 (in Windows 11 WSL2, swap disabled, 32GB available memory)
+
+## Implementations Benchmarked
+- [`LDeakin/zarrs`](https://github.com/LDeakin/zarrs) v0.14 (Rust 1.79.0) via [`LDeakin/zarrs_tools`](https://github.com/LDeakin/zarrs_tools) 0.4.2
+  - Benchmark executable: [zarrs_benchmark_read_sync](https://github.com/LDeakin/zarrs_tools/blob/main/src/bin/zarrs_benchmark_read_sync.rs)
+  - ~~Benchmark executable: [zarrs_benchmark_read_async](https://github.com/LDeakin/zarrs_tools/blob/main/src/bin/zarrs_benchmark_read_sync.rs)~~
+- [`google/tensorstore`](https://github.com/google/tensorstore) v0.1.61 (Python 3.12.3)
+  - Benchmark script: <https://github.com/LDeakin/zarrs_tools/blob/main/scripts/tensorstore_python_benchmark_read_async.py>
+- [`zarr-developers/zarr-python`](https://github.com/zarr-developers/zarr-python) 3.0.0a0 (Python 3.12.3)
+  - Benchmark script: <https://github.com/LDeakin/zarrs_tools/blob/main/scripts/zarr_python_benchmark_read_async.py>
 
-## Implementation Versions Benchmarked
-- zarrs_tools v0.3.0 (prerelease) installed with `RUSTFLAGS="-C target-cpu=native" cargo install --all-features --path .`
-- tensorstore v0.1.53 installed with `pip install tensorstore`
+> [!CAUTION]
+> Python benchmarks are subject to the overheads of Python and may not be using an optimal API for each zarr implementation.
+
+## Read Benchmarks
 
-## Comparative Benchmarks
+### Entire Array
+This benchmark measures the time and maximum memory used to read an entire dataset into memory.
+ - These are best of 3 measurements
+ - The disk cache is cleared between each measurement
 
-### Read Entire Array
 ```bash
 python3 ./scripts/run_benchmark_read_all.py
 ```
 
-> [!NOTE]
-> Rather than simply calling a single retrieve method like `async_retrieve_array_subset`, the zarrs async benchmark uses a ***complicated*** alternative routine.
->
-> This is necessary to achieve decent performance with many chunks because the zarrs async API is unable to parallelise across chunks.
-> See <https://docs.rs/zarrs/latest/zarrs/array/struct.Array.html#async-api>.
-
-| Image                              |   Wall time (s)<br>zarrs<br>sync |   <br><br>async |   <br>tensorstore<br>async |   Memory usage (GB)<br>zarrs<br>sync |   <br><br>async |   <br>tensorstore<br>async |
-|:-----------------------------------|---------------------------------:|----------------:|---------------------------:|-------------------------------------:|----------------:|---------------------------:|
-| data/benchmark.zarr                |                             3.03 |            9.27 |                       3.23 |                                 8.42 |            8.41 |                       8.58 |
-| data/benchmark_compress.zarr       |                             2.84 |            8.45 |                       2.68 |                                 8.44 |            8.43 |                       8.53 |
-| data/benchmark_compress_shard.zarr |                             1.62 |            1.83 |                       2.58 |                                 8.63 |            8.73 |                       8.57 |
+| Image                              |   Time (s)<br>zarrs<br>rust |   <br>tensorstore<br>python |   <br>zarr<br>python |   Memory (GB)<br>zarrs<br>rust |   <br>tensorstore<br>python |   <br>zarr<br>python |
+|:-----------------------------------|----------------------------:|----------------------------:|---------------------:|-------------------------------:|----------------------------:|---------------------:|
+| data/benchmark.zarr                |                        2.95 |                        3.17 |                51.53 |                           8.42 |                        8.59 |                15.28 |
+| data/benchmark_compress.zarr       |                        3    |                        2.83 |                74.82 |                           8.44 |                        8.53 |                19.14 |
+| data/benchmark_compress_shard.zarr |                        1.47 |                        2.18 |                36.37 |                           8.63 |                        8.94 |                27.42 |
 
-These are best of 3 measurements.
+### Chunk-By-Chunk
+This benchmark measures the time to read a dataset chunk-by-chunk into memory.
+ - These are best of 1 measurements
+ - The disk cache is cleared between each measurement
+ - TODO: Need to review scripts for tensorstore/zarr-python, performance is not improving much with concurrency
 
-### Read Chunk-By-Chunk
 ```bash
 python3 ./scripts/run_benchmark_read_chunks.py
 ```
 
-| Image                              |   Concurrency |   Wall time (s)<br>zarrs<br>sync |   <br><br>async |   <br>tensorstore<br>async |   Memory usage (GB)<br>zarrs<br>sync |   <br><br>async |   <br>tensorstore<br>async |
-|:-----------------------------------|--------------:|---------------------------------:|----------------:|---------------------------:|-------------------------------------:|----------------:|---------------------------:|
-| data/benchmark.zarr                |             1 |                            25.23 |           55.17 |                      52.57 |                                 0.03 |            0.01 |                       0.51 |
-| data/benchmark.zarr                |             2 |                            14.45 |           32.84 |                      30.98 |                                 0.03 |            0.01 |                       0.52 |
-| data/benchmark.zarr                |             4 |                             7.87 |           18.28 |                      23.71 |                                 0.03 |            0.01 |                       0.51 |
-| data/benchmark.zarr                |             8 |                             4.32 |           10.67 |                      20.98 |                                 0.03 |            0.02 |                       0.52 |
-| data/benchmark.zarr                |            16 |                             2.71 |            8.03 |                      19.39 |                                 0.03 |            0.02 |                       0.52 |
-| data/benchmark.zarr                |            32 |                             2.52 |            8.22 |                      18.58 |                                 0.03 |            0.03 |                       0.53 |
-| data/benchmark_compress.zarr       |             1 |                            20.78 |           36.4  |                      46.78 |                                 0.03 |            0.02 |                       0.51 |
-| data/benchmark_compress.zarr       |             2 |                            12.47 |           19.71 |                      27.16 |                                 0.03 |            0.02 |                       0.52 |
-| data/benchmark_compress.zarr       |             4 |                             7.11 |           11.06 |                      22.32 |                                 0.03 |            0.02 |                       0.51 |
-| data/benchmark_compress.zarr       |             8 |                             3.82 |            7.29 |                      20.01 |                                 0.03 |            0.03 |                       0.52 |
-| data/benchmark_compress.zarr       |            16 |                             2.22 |            7.09 |                      18.72 |                                 0.04 |            0.04 |                       0.54 |
-| data/benchmark_compress.zarr       |            32 |                             2.18 |            6.82 |                      17.72 |                                 0.04 |            0.07 |                       0.54 |
-| data/benchmark_compress_shard.zarr |             1 |                             2.59 |            2.63 |                       2.71 |                                 0.37 |            0.4  |                       0.42 |
-| data/benchmark_compress_shard.zarr |             2 |                             1.76 |            1.77 |                       2.31 |                                 0.7  |            0.76 |                       0.56 |
-| data/benchmark_compress_shard.zarr |             4 |                             1.48 |            1.46 |                       2.31 |                                 1.29 |            1.24 |                       1.05 |
-| data/benchmark_compress_shard.zarr |             8 |                             1.41 |            1.47 |                       2.57 |                                 2.37 |            2.29 |                       1.41 |
-| data/benchmark_compress_shard.zarr |            16 |                             1.57 |            1.56 |                       2.85 |                                 4.34 |            3.99 |                       2.13 |
-| data/benchmark_compress_shard.zarr |            32 |                             1.54 |            1.76 |                       3.15 |                                 6.54 |            6.9  |                       3.46 
+| Image                              |   Concurrency |   Time (s)<br>zarrs<br>rust |   Memory (GB)<br>zarrs<br>rust |
+|:-----------------------------------|--------------:|----------------------------:|-------------------------------:|
+| data/benchmark.zarr                |             1 |                       27.12 |                           0.03 |
+| data/benchmark.zarr                |             2 |                       15.15 |                           0.03 |
+| data/benchmark.zarr                |             4 |                        8.58 |                           0.02 |
+| data/benchmark.zarr                |             8 |                        4.74 |                           0.03 |
+| data/benchmark.zarr                |            16 |                        2.84 |                           0.02 |
+| data/benchmark.zarr                |            32 |                        2.8  |                           0.02 |
+| data/benchmark_compress.zarr       |             1 |                       22.15 |                           0.02 |
+| data/benchmark_compress.zarr       |             2 |                       13.47 |                           0.03 |
+| data/benchmark_compress.zarr       |             4 |                        7.68 |                           0.03 |
+| data/benchmark_compress.zarr       |             8 |                        4.16 |                           0.03 |
+| data/benchmark_compress.zarr       |            16 |                        2.44 |                           0.03 |
+| data/benchmark_compress.zarr       |            32 |                        2.42 |                           0.04 |
+| data/benchmark_compress_shard.zarr |             1 |                        2.53 |                           0.36 |
+| data/benchmark_compress_shard.zarr |             2 |                        1.58 |                           0.7  |
+| data/benchmark_compress_shard.zarr |             4 |                        1.42 |                           1.29 |
+| data/benchmark_compress_shard.zarr |             8 |                        1.5  |                           2.21 |
+| data/benchmark_compress_shard.zarr |            16 |                        1.38 |                           4.46 |
+| data/benchmark_compress_shard.zarr |            32 |                        1.5  |                           6.69 |
 
-These are best of 1 measurements.
+## Round Trip Benchmarks
+TODO
diff --git a/scripts/run_benchmark_read_all.py b/scripts/run_benchmark_read_all.py
@@ -3,31 +3,36 @@
 import subprocess
 import re
 import pandas as pd
-import numpy as np
 import math
+import numpy as np
+
+def clear_cache():
+    subprocess.call(['sudo', 'sh', '-c', "sync; echo 3 > /proc/sys/vm/drop_caches"])
 
 implementation_to_args = {
-    "zarrs_sync": ["/usr/bin/time", "-v", "zarrs_benchmark_read_sync", "--read-all"],
-    "zarrs_async": ["/usr/bin/time", "-v", "zarrs_benchmark_read_async", "--read-all"],
-    "tensorstore": ["/usr/bin/time", "-v", "./scripts/tensorstore_benchmark_read_async.py", "--read_all"],
+    "zarrs_rust": ["/usr/bin/time", "-v", "zarrs_benchmark_read_sync", "--read-all"],
+    # "zarrs_rust_async": ["/usr/bin/time", "-v", "zarrs_benchmark_read_async", "--read-all"],
+    "tensorstore_python": ["/usr/bin/time", "-v", "./scripts/tensorstore_python_benchmark_read_async.py", "--read_all"],
+    "zarr_python": ["/usr/bin/time", "-v", "./scripts/zarr_python_benchmark_read_async.py", "--read_all"],
 }
 
-def clear_cache():
-    subprocess.call(['sudo', 'sh', '-c', "sync; echo 3 > /proc/sys/vm/drop_caches"])
+implementations = ["zarrs_rust", "tensorstore_python", "zarr_python"]
+
+images = [
+    "data/benchmark.zarr",
+    "data/benchmark_compress.zarr",
+    "data/benchmark_compress_shard.zarr",
+]
 
 best_of = 3
 
 index = []
 rows = []
-for image in [
-    "data/benchmark.zarr",
-    "data/benchmark_compress.zarr",
-    "data/benchmark_compress_shard.zarr",
-]:
+for image in images:
     index.append(image)
     wall_times = []
     memory_usages = []
-    for implementation in ["zarrs_sync", "zarrs_async", "tensorstore"]:
+    for implementation in implementations:
         wall_time_measurements = []
         memory_usage_measurements = []
         for i in range(best_of):
@@ -49,10 +54,9 @@ def clear_cache():
                 m = int(wall_time.group(1))
                 s = float(wall_time.group(2))
                 wall_time_s = m * 60 + s
-                # print(wall_time_s)
                 memory_usage_kb = int(memory_usage.group(1))
                 memory_usage_gb = float(memory_usage_kb) / 1.0e6
-                # print(memory_usage_gb)
+                print(wall_time_s, memory_usage_gb)
                 wall_time_measurements.append(wall_time_s)
                 memory_usage_measurements.append(memory_usage_gb)
             else:
@@ -67,13 +71,12 @@ def clear_cache():
     row = wall_times + memory_usages
     rows.append(row)
 
-
 columns_pandas = []
 columns_markdown = []
-for metric in ["Wall time (s)", "Memory usage (GB)"]:
+for metric in ["Time (s)", "Memory (GB)"]:
     include_metric = True
     last_implementation = ""
-    for implementation, execution in [("zarrs", "sync"), ("zarrs", "async"), ("tensorstore", "async")]:
+    for implementation in implementations:
         column_markdown = ""
 
         # Metric
@@ -82,24 +85,20 @@ def clear_cache():
         column_markdown += "<br>"
         include_metric = False
 
-        # Implemnentation
+        # Implementation
         if implementation != last_implementation:
             last_implementation = implementation
-            column_markdown += implementation
-        column_markdown += "<br>"
-
-        # Execution
-        column_markdown += execution
+            column_markdown += implementation.replace("_", "<br>")
 
         columns_markdown.append(column_markdown)
-        columns_pandas.append((metric, implementation, execution))
+        columns_pandas.append((metric, implementation))
 
 data = {
     "index": index,
     "columns": columns_pandas,
     "data": rows,
     "index_names": ["Image"],
-    "column_names": ["Metric", "Implementation", "Execution"],
+    "column_names": ["Metric", "Implementation"],
 }
 
 # print(data)