[PyTorch][duplicate][CI] FP8 cuda graphs #766

ksivaman · 2024-04-10T07:19:57Z

This PR adds the following features (high-level):

make_graphed_callables API similar to the PyTorch API with some additional arguments for FP8 usage. Support for fp8 weight caching via existing is_first_microbatchargument is also retained.
Restructuring and amax reduction logic with a simpler design and handling of various parallelisms with minimal book-keeping compared to the previous approach.
Forward and backward amaxes are reduced within the scope of current iteration, solving numerous bugs w.r.t. checkpointing and removing the need to save global buffers.
Support for nested/multiple FP8 autocast contexts with different recipes and distributed groups.
Amax reductions are module independent and happen at at autocast level. This also resolves numerous bugs and allows for support for MoE/LoRA like models.
Redesign of transposes for Float8Tensor that makes the transposes persistent for graph capture. Also fixes use cases for the vanilla optimizers (non fp8-distopt).
The scaling inverses for weight tensors are no longer frozen when caching weights across microbatches.

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Vasudevan Rengasamy <vrengasamy@nvidia.com> Co-authored-by: Charlene Yang <charleney@nvidia.com>

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ksivaman · 2024-04-10T07:21:23Z

/te-ci pytorch

FP8 cuda graphs

31dc133

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Vasudevan Rengasamy <vrengasamy@nvidia.com> Co-authored-by: Charlene Yang <charleney@nvidia.com>

ksivaman added the 1.6.0 label Apr 10, 2024

ksivaman requested a review from timmoon10 April 10, 2024 07:19

Constant values for AMAX_PARAMS_LIMIT

190ccd9

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ksivaman closed this Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch][duplicate][CI] FP8 cuda graphs #766

[PyTorch][duplicate][CI] FP8 cuda graphs #766

ksivaman commented Apr 10, 2024

ksivaman commented Apr 10, 2024

[PyTorch][duplicate][CI] FP8 cuda graphs #766

[PyTorch][duplicate][CI] FP8 cuda graphs #766

Conversation

ksivaman commented Apr 10, 2024

ksivaman commented Apr 10, 2024