Skip to content

Commit

Permalink
Merge pull request #1847 from rapidsai/branch-21.10
Browse files Browse the repository at this point in the history
  • Loading branch information
ajschmidt8 committed Oct 6, 2021
2 parents e03e008 + 54b8573 commit fed9ee4
Show file tree
Hide file tree
Showing 551 changed files with 20,627 additions and 12,246 deletions.
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,12 @@ datasets/*
cpp/doxygen/html

# Raft symlink
python/cugraph/raft
python/cugraph/cugraph/raft
python/pylibcugraph/pylibcugraph/raft
python/_external_repositories/

# created by Dask tests
python/dask-worker-space

# Sphinx docs & build artifacts
docs/cugraph/source/api_docs/api/*
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# cuGraph 21.10.00 (Date TBD)

Please see https://github.com/rapidsai/cugraph/releases/tag/v21.10.00a for the latest changes to this development branch.

# cuGraph 21.08.00 (4 Aug 2021)

## 🚨 Breaking Changes
Expand Down
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,6 @@ As of Release 21.08 - including 21.08 nightly
| Traversal | | | |
| | Breadth First Search (BFS) | Multi-GPU | with cutoff support <br/> [C++ README](cpp/src/traversal/README.md#BFS) |
| | Single Source Shortest Path (SSSP) | Multi-GPU | [C++ README](cpp/src/traversal/README.md#SSSP) |
| | Traveling Salesperson Problem (TSP) | Single-GPU | |
| Tree | | | |
| | Minimum Spanning Tree | Single-GPU | |
| | Maximum Spanning Tree | Single-GPU | |
Expand Down
4 changes: 4 additions & 0 deletions SOURCEBUILD.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,10 @@ conda env create --name cugraph_dev --file conda/environments/cugraph_dev_cuda11
# for CUDA 11.2
conda env create --name cugraph_dev --file conda/environments/cugraph_dev_cuda11.2.yml

# for CUDA 11.4
conda env create --name cugraph_dev --file conda/environments/cugraph_dev_cuda11.4.yml


# activate the environment
conda activate cugraph_dev

Expand Down
126 changes: 126 additions & 0 deletions benchmarks/python_e2e/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# cuGraph benchmarks

## Overview

The sources are currently intended to benchmark `cuGraph` via the python API,
but future updates may include benchmarks written in C++ or other languages.

The benchmarks here use datasets generated by the RMAT graph generator but also
support csv files as input.

## Prerequisites
* cugraph built and installed (or `cugraph` sources and built C++ extensions
available on `PYTHONPATH`)

* Multi-node multi-GPU (MNMG) runs require a cluster environment set up and a
dask scheduler .json file be made accessible to all workers on every node.

* While not a strict prerequisite, the RAPIDS
[multi-gpu-tools](https://github.com/rapidsai/multi-gpu-tools) package can be
used to automate setting up a cluster for MNMG runs and will be referenced in
the examples below.

## Single-node multi-GPU runs (SNMG)
* Set the `CUDA_VISIBLE_DEVICES` environment variable to the number of GPUs to
use for the benchmarks. If the env var is unset, it is assumed to be a
single-GPU run. _Note: this is a slightly different usage for
`CUDA_VISIBLE_DEVICES` compared to the standard documented use of this env
var, where having it unset means "no restricted GPUs, use all of them"._

* Run `python main.py --help` for a list of all available benchmark options

* Run `python main.py` to run individual algo benchmarks with specific
options. For example, to benchmark a 2-GPU run of BFS and SSSP with a
generated graph of size scale 23:
```
(rapids) user@machine:/cugraph/benchmarks/python_e2e> export CUDA_VISIBLE_DEVICES=0,1
(rapids) user@machine:/cugraph/benchmarks/python_e2e> python main.py --scale=23 --algo=bfs --algo=sssp
calling setup...distributed.preloading - INFO - Import preload module: dask_cuda.initialize
distributed.preloading - INFO - Import preload module: dask_cuda.initialize
done.
running generate_edgelist (RMAT)...done.
running from_dask_cudf_edgelist...done.
running compute_renumber_edge_list...done.
running compute_renumber_edge_list...done.
running bfs (warmup)...done.
running bfs...done.
running sssp (warmup)...done.
running sssp...done.
from_dask_cudf_edgelist() 0.0133009
------------------------------------------------------------
bfs(start:73496080) 0.569328
------------------------------------------------------------
sssp(start:73496080) 1.48114
calling teardown...done.
```

* See [run_all_nightly_benches.sh](run_all_nightly_benches.sh) for an example of
multiple SNMG runs over different scales, gpu configurations and edgefactors

## Multi-node multi-GPU runs (MNMG)

* MNMG runs require a cluster of multi-GPU machines (nodes) that have access to
the same files. Each node must use the same cugraph environment (using a
shared conda environment is recommended).

* A dask scheduler must be running on one node in the cluster. The dask
scheduler generates a JSON file (typically named `dask-scheduler.json`) which
is intended to be read by the individual worker processes (see below) and the
dask client, and is therefore recommended to be hosted on a shared file system
that all other nodes in the cluster can access.

* dask workers must be running on each node in the cluster and passed the JSON
file (likely named `dask-scheduler.json`, see above) generated by the
scheduler.

* The dask client - the object created in the test/application process itself -
must be configured to match the configuration of the scheduler and
workers. This includes matching settings for UCX, NVLink, InfiniBand, etc. The
client will also read the scheduler JSON file for information on how to
communicate with the scheduler.

* The scheduler JSON file must be passed to the `main.py` script used to run the
individual benchmarks by passing the `--dask-scheduler-file` option. See
`python main.py --help` for more information.

* The [multi-gpu-tools](https://github.com/rapidsai/multi-gpu-tools) package can
make setting up the cluster easier by providing the following scripts:

* `run-dask-process.sh` - starts either a dask scheduler, worker, or both on
the machine it was run on. This script is typically launched using cluster
job management software (eg. Slurm, LSF) to start the required dask
processes on the nodes in the cluster. See the `--help` output for more
information.

* `wait_for_workers.py` - this script is run on the node running the dask
client, prior to the start of the test or benchmark script. This script does
not return until the requested number of workers are up and running. See the
`--help` output for more information.

* `default-config.sh` - not a runnable script, but a file that contains the
default values for various dask-related configuration options (among other
settings which may not be relevant and should be safe to ignore).

## Other Example Runs:
_**NOTE: Some algos require the graph to be symmetrized (Louvain, WCC) or unweighted.**_
* Run all the benchmarks with a generated dataset of scale=23
```
(rapids) user@machine:/cugraph/benchmarks/python_e2e> python main.py --scale=23
```

* Run all the benchmarks with a generated unweighted dataset of scale=23
```
(rapids) user@machine:/cugraph/benchmarks/python_e2e> python main.py --scale=23 --unweighted
```

* Symmetrize the generated dataset of scale=23 and run all the benchmarks
```
(rapids) user@machine:/cugraph/benchmarks/python_e2e> python main.py --scale=23 --symmetric-graph
```

* Create a graph from a csv file an run all the benchmarks
```
(rapids) user@machine:/cugraph/benchmarks/python_e2e> python main.py --csv='karate.csv'
```
165 changes: 165 additions & 0 deletions benchmarks/python_e2e/benchmark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# Copyright (c) 2021, NVIDIA CORPORATION.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import sys
import time
from functools import wraps


class BenchmarkedResult:
"""
Class to hold results (the return value of the callable being benchmarked
and meta-data about the benchmarked function) of a benchmarked function run.
"""
def __init__(self, name, retval, runtime, params=None):
self.name = name
self.retval = retval
self.runtime = runtime
self.params = params or {}
self.validator_result = True


def benchmark(func):
"""
Returns a callable/closure that wraps func with code to time the func call
and return a BenchmarkedResult. The resulting callable takes the same
args/kwargs as func.
The BenchmarkedResult will have its params value assigned from the kwargs
dictionary, but the func positional args are not captured. If a user needs
the params captured for reporting purposes, they must use kwargs. This is
useful since positional args can be used for args that would not be
meaningful in a benchmark result as a param to the benchmark.
This can be used as a function decorator or a standalone function to wrap
functions to benchmark.
"""
benchmark_name = getattr(func, "benchmark_name", func.__name__)
@wraps(func)
def benchmark_wrapper(*func_args, **func_kwargs):
t1 = time.perf_counter()
retval = func(*func_args, **func_kwargs)
t2 = time.perf_counter()
return BenchmarkedResult(name=benchmark_name,
retval=retval,
runtime=(t2-t1),
params=func_kwargs,
)

# Assign the name to the returned callable as well for use in debug prints,
# etc.
benchmark_wrapper.name = benchmark_name
return benchmark_wrapper


class BenchmarkRun:
"""
Represents a benchmark "run", which can be executed by calling the run()
method, and results are saved as BenchmarkedResult instances in the results
list member.
"""
def __init__(self,
input_dataframe,
construct_graph_func,
algo_func_param_list,
algo_validator_list=None
):
self.input_dataframe = input_dataframe

if type(construct_graph_func) is tuple:
(construct_graph_func,
self.construct_graph_func_args) = construct_graph_func
else:
self.construct_graph_func_args = None

# Create benchmark instances for each algo/func to be timed.
# FIXME: need to accept and save individual algo args
self.construct_graph = benchmark(construct_graph_func)

#add starting node to algos: BFS and SSSP
for i, algo in enumerate (algo_func_param_list):
if benchmark(algo).name in ["bfs", "sssp"]:
param={}
param["start"]=self.input_dataframe['src'].head()[0]
algo_func_param_list[i]=(algo,)+(param,)

self.algos = []
for item in algo_func_param_list:
if type(item) is tuple:
(algo, params) = item
else:
(algo, params) = (item, {})
self.algos.append((benchmark(algo), params))

self.validators = algo_validator_list or [None] * len(self.algos)
self.results = []


@staticmethod
def __log(s, end="\n"):
print(s, end=end)
sys.stdout.flush()


def run(self):
"""
Run and time the graph construction step, then run and time each algo.
"""
self.results = []

self.__log(f"running {self.construct_graph.name}...", end="")
result = self.construct_graph(self.input_dataframe,
*self.construct_graph_func_args)
self.__log("done.")
G = result.retval
self.results.append(result)

#algos with transposed=True : PageRank, Katz
#algos with transposed=False: BFS, SSSP, Louvain
for i in range(len(self.algos)):
if self.algos[i][0].name in ["pagerank", "katz"]: #set transpose=True when renumbering
if self.algos[i][0].name == "katz" and self.construct_graph.name == "from_dask_cudf_edgelist":
largest_out_degree = G.out_degree().compute().\
nlargest(n=1, columns="degree") #compute outdegree before renumbering because outdegree has transpose=False
largest_out_degree = largest_out_degree["degree"].iloc[0]
katz_alpha = 1 / (largest_out_degree + 1)
self.algos[i][1]["alpha"] = katz_alpha
elif self.algos[i][0].name == "katz" and self.construct_graph.name == "from_cudf_edgelist":
largest_out_degree = G.out_degree().nlargest(n=1, columns="degree")
largest_out_degree = largest_out_degree["degree"].iloc[0]
katz_alpha = 1 / (largest_out_degree + 1)
self.algos[i][1]["alpha"] = katz_alpha
if hasattr(G, "compute_renumber_edge_list"):
G.compute_renumber_edge_list(transposed=True)
else: #set transpose=False when renumbering
self.__log("running compute_renumber_edge_list...", end="")
if hasattr(G, "compute_renumber_edge_list"):
G.compute_renumber_edge_list(transposed=False)
self.__log("done.")
# FIXME: need to handle individual algo args
for ((algo, params), validator) in zip(self.algos, self.validators):
self.__log(f"running {algo.name} (warmup)...", end="")
algo(G, **params)
self.__log("done.")
self.__log(f"running {algo.name}...", end="")
result = algo(G, **params)
self.__log("done.")

if validator:
result.validator_result = validator(result.retval, G)

self.results.append(result)
# Reclaim memory since computed algo result is no longer needed
result.retval = None

return False not in [r.validator_result for r in self.results]
Loading

0 comments on commit fed9ee4

Please sign in to comment.