Documentation corrections following optimization improvements (#423)

* Documentation corrections following optimization improvements * Update reduction documentation to reflect new implementation
cubed-dev · Mar 12, 2024 · ff6daf5 · ff6daf5
1 parent 012628c
commit ff6daf5
Show file tree

Hide file tree

Showing 5 changed files with 17 additions and 19 deletions.
diff --git a/docs/conf.py b/docs/conf.py
@@ -56,7 +56,7 @@
 
 # General information about the project.
 project = "Cubed"
-copyright = "2022-2023, Tom White"
+copyright = "2022-2024, Tom White"
 author = "Tom White"
 
 # The version info for the project you're documenting, acts as replacement for

diff --git a/docs/images/reduction_new.svg b/docs/images/reduction_new.svg
diff --git a/docs/operations.md b/docs/operations.md
@@ -82,10 +82,11 @@ The `reduction` operation reduces an array along one or more axes.
 * No input array attributes are preserved in general
 * __Single__ input, __single__ output
 
-It is a core operation that is implemented using repeated calls to `blockwise` and `rechunk`.
+It is a core operation that is implemented using a `blockwise` operation called `partial_reduce` that reads multiple blocks and performs the reduction operation on them.
+The `partial_reduce` operations are arranged in a tree (`tree_reduce`) with multiple rounds until there's a single block in each reduction axis. Finally an aggregrate `blockwise` operation is applied to the results.
 
-Here is an example of reducing over the first axis, with a single round of `rechunk` and `combine` - in general there would be multiple rounds until there's a single block in each reduction axis.
+Here is an example of reducing over the first axis, with two rounds of `partial_reduce` operations:
 
-![The reduction core operation](images/reduction.svg)
+![The reduction core operation](images/reduction_new.svg)
 
 The `arg_reduction` works similarly, but uses different functions to return indexes rather than values.
diff --git a/docs/user-guide/executors.md b/docs/user-guide/executors.md
@@ -8,17 +8,16 @@ Cubed provides a variety of executors for running the tasks in a computation, wh
 
 If you don't specify an executor then the local in-process Python executor is used. This is a very simple, single-threaded executor (called {py:class}`PythonDagExecutor <cubed.runtime.executors.python.PythonDagExecutor>`) that is intended for testing on small amounts of data before running larger computations using a cloud service.
 
-## Which cloud service should I use?
+## Which cloud service executor should I use?
 
-[**Modal**](https://modal.com/) is the easiest to get started with because it handles building a runtime environment for you automatically (note that it requires that you [sign up](https://modal.com/signup) for a free account).
-It has been tested with ~300 workers and works with AWS and GCP.
+[**Lithops**](https://lithops-cloud.github.io/) is the executor we recommend for most users, since it has had the most testing so far (~1000 workers).
+If your data is in Amazon S3 then use Lithops with AWS Lambda, and if it's in GCS use Lithops with Google Cloud Functions. You have to build a runtime environment as a part of the setting up process.
 
-[**Lithops**](https://lithops-cloud.github.io/) requires slightly more work to get started since you have to build a runtime environment first.
-Lithops has support for many serverless services on various cloud providers, but has so far been tested on two:
-- **AWS Lambda** requires building a Docker container first, and has been tested with ~1000 workers.
-- **Google Cloud Functions** only requires building a Lithops runtime, which can be created from a pip-style `requirements.txt` without Docker. It has been tested with ~1000 workers.
+[**Modal**](https://modal.com/) is very easy to get started with because it handles building a runtime environment for you automatically (note that it requires that you [sign up](https://modal.com/signup) for a free account). **At the time of writing, Modal does not guarantee that functions run in any particular cloud region, so it is not currently recommended that you run large computations since excessive data transfer fees are likely.**
 
-[**Google Cloud Dataflow**](https://cloud.google.com/dataflow) is relatively straightforward to get started with. It has the highest overhead for worker startup (minutes compared to seconds for Modal or Lithops), and although it has only been tested with ~20 workers, it is the most mature service and therefore should be reliable for much larger computations.
+[**Coiled**](https://www.coiled.io/) is also easy to get started with ([sign up](https://cloud.coiled.io/signup)). It uses [Coiled Functions](https://docs.coiled.io/user_guide/usage/functions/index.html) and has a 1-2 minute overhead to start a cluster.
+
+[**Google Cloud Dataflow**](https://cloud.google.com/dataflow) is relatively straightforward to get started with. It has the highest overhead for worker startup (minutes compared to seconds for Modal or Lithops), and although it has only been tested with ~20 workers, it is a mature service and therefore should be reliable for much larger computations.
 
 ## Specifying an executor
 
@@ -35,4 +34,4 @@ spec = cubed.Spec(
 )
 ```
 
-Alternatively an executor may be specified when {py:func}`compute() <cubed.compute>` is called. The [examples](https://github.com/tomwhite/cubed/tree/main/examples/README.md) show this in more detail for all of the cloud services described above.
+A default spec may also be configured using a YAML file. The [examples](https://github.com/tomwhite/cubed/tree/main/examples/README.md) show this in more detail for all of the cloud services described above.
diff --git a/docs/user-guide/scaling.md b/docs/user-guide/scaling.md
@@ -35,7 +35,7 @@ Weak scaling requires more workers than output chunks, so for large problems it
 With fewer workers than chunks we would expect linear strong scaling, as every new worker added has nothing to wait for.
 
 Stragglers are tasks that take much longer than average, who disproportionately hold up the next step of the computation.
-Stargglers are handled by running backup tasks for any tasks that are running very slowly. This feature is enabled by default, but
+Stragglers are handled by running backup tasks for any tasks that are running very slowly. This feature is enabled by default, but
 if you need to turn it off you can do so with ``use_backups=False``.
 Worker start-up time is another practical speed consideration, though it would delay computations of all scales equally.
 
@@ -49,9 +49,6 @@ Hence, reducing the number of steps in the plan can lead to significant performa
 Reductions can be carried out in fewer iterative steps if ``allowed_mem`` is larger.
 Cubed automatically fuses some steps to enhance performance, but others (especially rechunk) cannot be fused without requiring a shuffle, which can potentially violate memory constraints.
 
-```{note} In theory multiple blockwise operations can be fused together, enhancing the performance further. However this has not yet been implemented in Cubed.
-```
-
 In practical scenarios, stragglers can hold up the completion of each step separately, thereby cumulatively affecting the overall performance of the calculation.
 
 ### Multi-pipeline Calculation
@@ -60,8 +57,8 @@ A "pipeline" refers to an independent branch of the calculation.
 For example, if you have two separate arrays to compute simultaneously, full parallelism requires sufficient workers for both tasks.
 The same logic applies if you have two arrays feeding into a single array or vice versa.
 
-
-```{note} Currently Cubed will not necessarily execute independent pipelines in parallel on all executors.
+```{note}
+Currently Cubed will only execute independent pipelines in parallel if `compute_arrays_in_parallel=True` is passed to the executor function.
 ```