Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: We need to append the `sys.executable` when running NCU in OSS environment. This is not needed in Meta internal. Pull Request resolved: #2387 Test Plan: ``` TORCH_CUDA_ARCH_LIST=9.0a CUDA_VISIBLE_DEVICES=5 python run_benchmark.py triton --op flash_attention --only flash_v3 --num-inputs 1 --dump-csv --metrics ncu_rep --batch 8 --n-heads 16 --d-head 128 SeqLen flash_v3-ncu_rep -------- ------------------------------------------------------------------------- 128 /tmp/tritonbench/flash_attention/ncu_traces/flash_v3_0/ncu_output.ncu-rep ``` Reviewed By: manman-ren Differential Revision: D59920615 Pulled By: xuzhao9 fbshipit-source-id: c27b9aef7048dbefcd93a7233df632a8886c71c9
- Loading branch information