Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVRTC kernels for cast-transpose #258

Merged
merged 22 commits into from
Apr 19, 2024
Merged

Conversation

timmoon10
Copy link
Collaborator

This is a follow-up to #138 that adds NVRTC kernels for cast-transpose.

I've also tweaked the kernel selection heuristics to be a bit more general. There are some performance differences at intermediate scale, but nothing too drastic:

Transpose kernel times

image

Evaluated FP8 transposes on an H100 PCIe with square matrices with nice dims (power of two or halfway between powers of two).

@timmoon10 timmoon10 added the enhancement New feature or request label Jun 1, 2023
@timmoon10 timmoon10 requested a review from ptrendx June 1, 2023 21:22
Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10 timmoon10 force-pushed the nvrtc-cast-transpose branch from dfc121b to aed892a Compare June 1, 2023 21:24
@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10
Copy link
Collaborator Author

If there are no objections, I will merge this after the 1.6 release.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10
Copy link
Collaborator Author

/te-ci

const size_t num_tiles) {
__launch_bounds__(block_size)
cast_transpose_general_kernel(const IType * __restrict__ const input,
const CType * const noop,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the restrict keyword intentionally omitted here?

Copy link
Collaborator

@Oleg-Goncharov Oleg-Goncharov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, besides the early return from the kernel configuration constructors

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10 timmoon10 merged commit 14c1ecd into NVIDIA:main Apr 19, 2024
22 of 26 checks passed
pggPL pushed a commit to pggPL/TransformerEngine that referenced this pull request May 15, 2024
* Add NVRTC kernels for cast-transpose

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update copyright year

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Add noop flag to NVRTC cast-transpose kernel

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Apply suggestions from code review

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
pggPL pushed a commit to pggPL/TransformerEngine that referenced this pull request May 16, 2024
* Add NVRTC kernels for cast-transpose

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update copyright year

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Add noop flag to NVRTC cast-transpose kernel

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Apply suggestions from code review

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
pggPL pushed a commit to pggPL/TransformerEngine that referenced this pull request May 23, 2024
* Add NVRTC kernels for cast-transpose

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update copyright year

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Add noop flag to NVRTC cast-transpose kernel

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Apply suggestions from code review

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants