Skip to content

Profile rocFFT kernels

tingxingdong edited this page Jan 17, 2018 · 11 revisions

On AMD GPU

in bash: "export HIP_TRACE_API=1" (reset by =0)

Launch your application, then it profiles every HIP APIs, including rocFFT kernels, memory copy and allocation/deallocation.

For more profiling tools, see Profiling and Debugging HIP Code

The IR and ISA can be dumped by setting the following environment variable before building and running the app.

export KMDUMPISA=1

export KMDUMPLLVM=1

export KMDUMPDIR=/path/to/dump

By roprof

a tool very similar to nvprof, roprof is a command line tool to profile HIP kernels, roprof is located in /opt/rocm/profiler/bin

example usage

/opt/rocm/profiler/bin/rcprof -A ./your_executable Then the dumped output apitrace.atp will be in your home directory.

Download CodeXL and open the *.atp with the CodeXL. Notice: switch to profile mode before open the *.atp

it will dump several a bunch of profile.HSA*.html files, you can view it by any internet browser.

/opt/rocm/profiler/bin/rcprof --help for more options

On NVIDIA GPU

"nvprof ./your_executable" to profile every CUDA runtime invocations including kernels, memory copy.

Clone this wiki locally