-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NVTX ranges to categorize execution #1447
base: main
Are you sure you want to change the base?
Conversation
58fa25a
to
9e5abb5
Compare
Please review @timmoon10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I have some very minor renaming suggestions. We can merge once we fix merge conflicts and the TE main
branch has been reconciled with release_v2.0
.
For future reference, my comments on an earlier version of this PR: The overall design is delicate, but I can't think of a better approach. The torch.cuda.nvtx.range
context is nicer than range_push
/range_pop
, but its CPU overhead is too high (1.7 us compared to 0.5 us).
@timmoon10 Sounds good, thanks! |
9e21a84
to
2914da2
Compare
I've added the option in |
2914da2
to
54db46e
Compare
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: Jaemin Choi <jaeminc@nvidia.com> Co-authored-by: Tim Moon <tmoon@nvidia.com>
54db46e
to
908461a
Compare
/te-ci pytorch |
Description
Adds NVTX ranges to categorize different parts of the execution.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: