Merge pull request #1847 from rapidsai/branch-21.10

rapidsai · Oct 6, 2021 · fed9ee4 · fed9ee4
2 parents e03e008 + 54b8573
commit fed9ee4
Show file tree

Hide file tree

Showing 551 changed files with 20,627 additions and 12,246 deletions.
diff --git a/.gitignore b/.gitignore
@@ -76,8 +76,12 @@ datasets/*
 cpp/doxygen/html
 
 # Raft symlink
-python/cugraph/raft
+python/cugraph/cugraph/raft
+python/pylibcugraph/pylibcugraph/raft
 python/_external_repositories/
 
 # created by Dask tests
 python/dask-worker-space
+
+# Sphinx docs & build artifacts
+docs/cugraph/source/api_docs/api/*
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,7 @@
+# cuGraph 21.10.00 (Date TBD)
+
+Please see https://github.com/rapidsai/cugraph/releases/tag/v21.10.00a for the latest changes to this development branch.
+
 # cuGraph 21.08.00 (4 Aug 2021)
 
 ## 🚨 Breaking Changes

diff --git a/README.md b/README.md
@@ -83,7 +83,6 @@ As of Release 21.08 - including 21.08 nightly
 | Traversal    |                                        |              |                     |
 |              | Breadth First Search (BFS)             | Multi-GPU    | with cutoff support <br/> [C++ README](cpp/src/traversal/README.md#BFS) |
 |              | Single Source Shortest Path (SSSP)     | Multi-GPU    | [C++ README](cpp/src/traversal/README.md#SSSP) |
-|              | Traveling Salesperson Problem (TSP)    | Single-GPU   |                     |
 | Tree         |                                        |              |                     |
 |              | Minimum Spanning Tree                  | Single-GPU   |                     |
 |              | Maximum Spanning Tree                  | Single-GPU   |                     |

diff --git a/SOURCEBUILD.md b/SOURCEBUILD.md
@@ -49,6 +49,10 @@ conda env create --name cugraph_dev --file conda/environments/cugraph_dev_cuda11
 # for CUDA 11.2
 conda env create --name cugraph_dev --file conda/environments/cugraph_dev_cuda11.2.yml
 
+# for CUDA 11.4
+conda env create --name cugraph_dev --file conda/environments/cugraph_dev_cuda11.4.yml
+
+
 # activate the environment
 conda activate cugraph_dev
 

diff --git a/benchmarks/python_e2e/README.md b/benchmarks/python_e2e/README.md
@@ -0,0 +1,126 @@
+# cuGraph benchmarks
+
+## Overview
+
+The sources are currently intended to benchmark `cuGraph` via the python API,
+but future updates may include benchmarks written in C++ or other languages.
+
+The benchmarks here use datasets generated by the RMAT graph generator but also
+support csv files as input.
+
+## Prerequisites
+* cugraph built and installed (or `cugraph` sources and built C++ extensions
+  available on `PYTHONPATH`)
+
+* Multi-node multi-GPU (MNMG) runs require a cluster environment set up and a
+  dask scheduler .json file be made accessible to all workers on every node.
+
+* While not a strict prerequisite, the RAPIDS
+  [multi-gpu-tools](https://github.com/rapidsai/multi-gpu-tools) package can be
+  used to automate setting up a cluster for MNMG runs and will be referenced in
+  the examples below.
+
+## Single-node multi-GPU runs (SNMG)
+* Set the `CUDA_VISIBLE_DEVICES` environment variable to the number of GPUs to
+  use for the benchmarks. If the env var is unset, it is assumed to be a
+  single-GPU run. _Note: this is a slightly different usage for
+  `CUDA_VISIBLE_DEVICES` compared to the standard documented use of this env
+  var, where having it unset means "no restricted GPUs, use all of them"._
+
+* Run `python main.py --help` for a list of all available benchmark options
+
+* Run `python main.py` to run individual algo benchmarks with specific
+  options. For example, to benchmark a 2-GPU run of BFS and SSSP with a
+  generated graph of size scale 23:
+```
+(rapids) user@machine:/cugraph/benchmarks/python_e2e> export CUDA_VISIBLE_DEVICES=0,1
+
+(rapids) user@machine:/cugraph/benchmarks/python_e2e> python main.py --scale=23 --algo=bfs --algo=sssp
+calling setup...distributed.preloading - INFO - Import preload module: dask_cuda.initialize
+distributed.preloading - INFO - Import preload module: dask_cuda.initialize
+done.
+running generate_edgelist (RMAT)...done.
+running from_dask_cudf_edgelist...done.
+running compute_renumber_edge_list...done.
+running compute_renumber_edge_list...done.
+running bfs (warmup)...done.
+running bfs...done.
+running sssp (warmup)...done.
+running sssp...done.
+from_dask_cudf_edgelist()     0.0133009
+------------------------------------------------------------
+bfs(start:73496080)           0.569328
+------------------------------------------------------------
+sssp(start:73496080)          1.48114
+
+calling teardown...done.
+```
+
+* See [run_all_nightly_benches.sh](run_all_nightly_benches.sh) for an example of
+  multiple SNMG runs over different scales, gpu configurations and edgefactors
+
+## Multi-node multi-GPU runs (MNMG)
+
+* MNMG runs require a cluster of multi-GPU machines (nodes) that have access to
+  the same files.  Each node must use the same cugraph environment (using a
+  shared conda environment is recommended).
+
+* A dask scheduler must be running on one node in the cluster. The dask
+  scheduler generates a JSON file (typically named `dask-scheduler.json`) which
+  is intended to be read by the individual worker processes (see below) and the
+  dask client, and is therefore recommended to be hosted on a shared file system
+  that all other nodes in the cluster can access.
+
+* dask workers must be running on each node in the cluster and passed the JSON
+  file (likely named `dask-scheduler.json`, see above) generated by the
+  scheduler.
+
+* The dask client - the object created in the test/application process itself -
+  must be configured to match the configuration of the scheduler and
+  workers. This includes matching settings for UCX, NVLink, InfiniBand, etc. The
+  client will also read the scheduler JSON file for information on how to
+  communicate with the scheduler.
+
+* The scheduler JSON file must be passed to the `main.py` script used to run the
+  individual benchmarks by passing the `--dask-scheduler-file` option. See
+  `python main.py --help` for more information.
+
+* The [multi-gpu-tools](https://github.com/rapidsai/multi-gpu-tools) package can
+  make setting up the cluster easier by providing the following scripts:
+
+  * `run-dask-process.sh` - starts either a dask scheduler, worker, or both on
+    the machine it was run on. This script is typically launched using cluster
+    job management software (eg. Slurm, LSF) to start the required dask
+    processes on the nodes in the cluster. See the `--help` output for more
+    information.
+
+  * `wait_for_workers.py` - this script is run on the node running the dask
+    client, prior to the start of the test or benchmark script. This script does
+    not return until the requested number of workers are up and running. See the
+    `--help` output for more information.
+
+  * `default-config.sh` - not a runnable script, but a file that contains the
+    default values for various dask-related configuration options (among other
+    settings which may not be relevant and should be safe to ignore).
+
+## Other Example Runs:
+_**NOTE: Some algos require the graph to be symmetrized (Louvain, WCC) or unweighted.**_
+* Run all the benchmarks with a generated dataset of scale=23
+```
+(rapids) user@machine:/cugraph/benchmarks/python_e2e> python main.py --scale=23
+```
+
+* Run all the benchmarks with a generated unweighted dataset of scale=23
+```
+(rapids) user@machine:/cugraph/benchmarks/python_e2e> python main.py --scale=23 --unweighted
+```
+
+* Symmetrize the generated dataset of scale=23 and run all the benchmarks
+```
+(rapids) user@machine:/cugraph/benchmarks/python_e2e> python main.py --scale=23 --symmetric-graph
+```
+
+* Create a graph from a csv file an run all the benchmarks
+```
+(rapids) user@machine:/cugraph/benchmarks/python_e2e> python main.py --csv='karate.csv'
+```
diff --git a/benchmarks/python_e2e/benchmark.py b/benchmarks/python_e2e/benchmark.py
@@ -0,0 +1,165 @@
+# Copyright (c) 2021, NVIDIA CORPORATION.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import sys
+import time
+from functools import wraps
+
+
+class BenchmarkedResult:
+    """
+    Class to hold results (the return value of the callable being benchmarked
+    and meta-data about the benchmarked function) of a benchmarked function run.
+    """
+    def __init__(self, name, retval, runtime, params=None):
+        self.name = name
+        self.retval = retval
+        self.runtime = runtime
+        self.params = params or {}
+        self.validator_result = True
+
+
+def benchmark(func):
+    """
+    Returns a callable/closure that wraps func with code to time the func call
+    and return a BenchmarkedResult. The resulting callable takes the same
+    args/kwargs as func.
+
+    The BenchmarkedResult will have its params value assigned from the kwargs
+    dictionary, but the func positional args are not captured. If a user needs
+    the params captured for reporting purposes, they must use kwargs.  This is
+    useful since positional args can be used for args that would not be
+    meaningful in a benchmark result as a param to the benchmark.
+
+    This can be used as a function decorator or a standalone function to wrap
+    functions to benchmark.
+    """
+    benchmark_name = getattr(func, "benchmark_name", func.__name__)
+    @wraps(func)
+    def benchmark_wrapper(*func_args, **func_kwargs):
+        t1 = time.perf_counter()
+        retval = func(*func_args, **func_kwargs)
+        t2 = time.perf_counter()
+        return BenchmarkedResult(name=benchmark_name,
+                                 retval=retval,
+                                 runtime=(t2-t1),
+                                 params=func_kwargs,
+                                )
+
+    # Assign the name to the returned callable as well for use in debug prints,
+    # etc.
+    benchmark_wrapper.name = benchmark_name
+    return benchmark_wrapper
+
+
+class BenchmarkRun:
+    """
+    Represents a benchmark "run", which can be executed by calling the run()
+    method, and results are saved as BenchmarkedResult instances in the results
+    list member.
+    """
+    def __init__(self,
+                 input_dataframe,
+                 construct_graph_func,
+                 algo_func_param_list,
+                 algo_validator_list=None
+                ):
+        self.input_dataframe = input_dataframe
+
+        if type(construct_graph_func) is tuple:
+            (construct_graph_func,
+             self.construct_graph_func_args) = construct_graph_func
+        else:
+            self.construct_graph_func_args = None
+
+        # Create benchmark instances for each algo/func to be timed.
+        # FIXME: need to accept and save individual algo args
+        self.construct_graph = benchmark(construct_graph_func)
+
+        #add starting node to algos: BFS and SSSP
+        for i, algo in enumerate (algo_func_param_list):
+            if benchmark(algo).name in ["bfs", "sssp"]:
+                param={}
+                param["start"]=self.input_dataframe['src'].head()[0]
+                algo_func_param_list[i]=(algo,)+(param,)
+
+        self.algos = []
+        for item in algo_func_param_list:
+            if type(item) is tuple:
+                (algo, params) = item
+            else:
+                (algo, params) = (item, {})
+            self.algos.append((benchmark(algo), params))
+
+        self.validators = algo_validator_list or [None] * len(self.algos)
+        self.results = []
+
+
+    @staticmethod
+    def __log(s, end="\n"):
+        print(s, end=end)
+        sys.stdout.flush()
+
+
+    def run(self):
+        """
+        Run and time the graph construction step, then run and time each algo.
+        """
+        self.results = []
+
+        self.__log(f"running {self.construct_graph.name}...", end="")
+        result = self.construct_graph(self.input_dataframe,
+                                      *self.construct_graph_func_args)
+        self.__log("done.")
+        G = result.retval
+        self.results.append(result)
+
+        #algos with transposed=True : PageRank, Katz
+        #algos with transposed=False: BFS, SSSP, Louvain
+        for i in range(len(self.algos)):
+            if self.algos[i][0].name in ["pagerank", "katz"]: #set transpose=True when renumbering
+                if self.algos[i][0].name == "katz" and self.construct_graph.name == "from_dask_cudf_edgelist":
+                    largest_out_degree = G.out_degree().compute().\
+                    nlargest(n=1, columns="degree") #compute outdegree before renumbering because outdegree has transpose=False
+                    largest_out_degree = largest_out_degree["degree"].iloc[0]
+                    katz_alpha = 1 / (largest_out_degree + 1)
+                    self.algos[i][1]["alpha"] = katz_alpha
+                elif self.algos[i][0].name == "katz" and self.construct_graph.name == "from_cudf_edgelist":
+                    largest_out_degree = G.out_degree().nlargest(n=1, columns="degree")
+                    largest_out_degree = largest_out_degree["degree"].iloc[0]
+                    katz_alpha = 1 / (largest_out_degree + 1)
+                    self.algos[i][1]["alpha"] = katz_alpha
+                if hasattr(G, "compute_renumber_edge_list"):
+                    G.compute_renumber_edge_list(transposed=True)
+            else: #set transpose=False when renumbering
+                self.__log("running compute_renumber_edge_list...", end="")
+                if hasattr(G, "compute_renumber_edge_list"):
+                    G.compute_renumber_edge_list(transposed=False)
+                self.__log("done.")
+        # FIXME: need to handle individual algo args
+        for ((algo, params), validator) in zip(self.algos, self.validators):
+            self.__log(f"running {algo.name} (warmup)...", end="")
+            algo(G, **params)
+            self.__log("done.")
+            self.__log(f"running {algo.name}...", end="")
+            result = algo(G, **params)
+            self.__log("done.")
+
+            if validator:
+                result.validator_result = validator(result.retval, G)
+
+            self.results.append(result)
+            # Reclaim memory since computed algo result is no longer needed
+            result.retval = None
+
+        return False not in [r.validator_result for r in self.results]