Skip to content

Commit

Permalink
Add oss support to ncu rep (#2387)
Browse files Browse the repository at this point in the history
Summary:
We need to append the `sys.executable` when running NCU in OSS environment. This is not needed in Meta internal.

Pull Request resolved: #2387

Test Plan:
```
TORCH_CUDA_ARCH_LIST=9.0a CUDA_VISIBLE_DEVICES=5 python run_benchmark.py triton --op flash_attention --only flash_v3 --num-inputs 1 --dump-csv --metrics ncu_rep --batch 8 --n-heads 16 --d-head 128

  SeqLen                                                           flash_v3-ncu_rep
--------  -------------------------------------------------------------------------
     128  /tmp/tritonbench/flash_attention/ncu_traces/flash_v3_0/ncu_output.ncu-rep
```

Reviewed By: manman-ren

Differential Revision: D59920615

Pulled By: xuzhao9

fbshipit-source-id: c27b9aef7048dbefcd93a7233df632a8886c71c9
  • Loading branch information
xuzhao9 authored and facebook-github-bot committed Jul 18, 2024
1 parent 366d6a1 commit 5efd1dc
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion torchbenchmark/util/triton_op.py
Original file line number Diff line number Diff line change
Expand Up @@ -878,7 +878,8 @@ def ncu_trace(self, input_id: int, fn_name: str, replay: bool=False, profile_ir=
import sys
import subprocess

op_task_args = copy.deepcopy(sys.argv)
op_task_args = [] if IS_FBCODE else [sys.executable]
op_task_args.extend(copy.deepcopy(sys.argv))
for override_option in ["--only", "--input-id", "--num-inputs", "--metrics"]:
op_task_args = _remove_params(
op_task_args, _find_param_loc(op_task_args, override_option)
Expand Down

0 comments on commit 5efd1dc

Please sign in to comment.