-
Notifications
You must be signed in to change notification settings - Fork 354
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
C++ code and TE/PyTorch general_gemm updated to support TP overlap wi…
…th cppqtensor Signed-off-by: Alp Dener <adener@nvidia.com> CommOverlap objects can now return overlap buffers to PyTorch as QuantizedTensors Signed-off-by: Alp Dener <adener@nvidia.com> updated comm+GEMM overlap test for pure GEMM, both BF16 and FP8 working with QuantizedTensor Signed-off-by: Alp Dener <adener@nvidia.com> te.Linear and te.LayerNormMLP updated for TP overlap w/ QuantizedTensor. All overlaps work in BF16. All ovrlaps except bulk WGRAD work in FP8. Signed-off-by: Alp Dener <adener@nvidia.com> completed TP overlap QuantizedTensor updates for LayerNormLinear, but issues with quantized normalization Signed-off-by: Alp Dener <adener@nvidia.com> all overlaps working with bf16, all but bulk WGRAD working with FP8 Signed-off-by: Alp Dener <adener@nvidia.com> all overlaps work with Float8Tensor, except bulk wgrad in LayerNormMLP (works in other modules) Signed-off-by: Alp Dener <adener@nvidia.com> all overlaps working with QuantizedTensor in BF16 and FP8 Signed-off-by: Alp Dener <adener@nvidia.com> cleaned up pytest formatting Signed-off-by: Alp Dener <adener@nvidia.com>
- Loading branch information
Showing
22 changed files
with
1,556 additions
and
1,441 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.