Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to use nsight compute to get the kernel performance? #46

Open
ihaterecursion opened this issue Mar 7, 2025 · 1 comment
Open

Comments

@ihaterecursion
Copy link

i use command ncu --set full -o deepgemm python tests/test_core.py
but report error:
==ERROR== ERR_NVGPUCTRPERM - The user does not have permission to access NVIDIA GPU Performance Counters on the target device 0. For instructions on enabling permissions and to get more information see https://developer.nvidia.com/ERR_NVGPUCTRPERM
Traceback (most recent call last):
File "/mnt/workspace/xxx/DeepGEMM-main/tests/test_core.py", line 167, in
test_gemm()
File "/mnt/workspace/xxx/DeepGEMM-main/tests/test_core.py", line 80, in test_gemm
deep_gemm.gemm_fp8_fp8_bf16_nt(x_fp8, y_fp8, out)
File "/usr/local/lib/python3.12/dist-packages/deep_gemm/jit_kernels/gemm.py", line 162, in gemm_fp8_fp8_bf16_nt
runtime = jit_tuner.compile_and_tune(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/deep_gemm/jit_kernels/tuner.py", line 40, in compile_and_tune
kernels.append((build(name, arg_defs, code), tuned_keys))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/deep_gemm/jit/compiler.py", line 110, in build
enable_sass_opt = get_nvcc_compiler()[1] <= '12.8' and int(os.getenv('DG_DISABLE_FFMA_INTERLEAVE', 0)) == 0
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/deep_gemm/jit/compiler.py", line 57, in get_nvcc_compiler
version = match.group(1)
^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'group'
==PROF== Disconnected from process 25948
==ERROR== The application returned an error code (1).

@soundOfDestiny
Copy link
Collaborator

DeepGEMM cannot find the version of your NVCC.
You can print os.popen(f'{path} --version').read() in code before version = match.group(1), or simply run 'nvcc --version' to debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants