Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PyTorch][duplicate][CI] FP8 cuda graphs #766

Closed
wants to merge 2 commits into from

Conversation

ksivaman
Copy link
Member

This PR adds the following features (high-level):

  • make_graphed_callables API similar to the PyTorch API with some additional arguments for FP8 usage. Support for fp8 weight caching via existing is_first_microbatchargument is also retained.
  • Restructuring and amax reduction logic with a simpler design and handling of various parallelisms with minimal book-keeping compared to the previous approach.
  • Forward and backward amaxes are reduced within the scope of current iteration, solving numerous bugs w.r.t. checkpointing and removing the need to save global buffers.
  • Support for nested/multiple FP8 autocast contexts with different recipes and distributed groups.
  • Amax reductions are module independent and happen at at autocast level. This also resolves numerous bugs and allows for support for MoE/LoRA like models.
  • Redesign of transposes for Float8Tensor that makes the transposes persistent for graph capture. Also fixes use cases for the vanilla optimizers (non fp8-distopt).
  • The scaling inverses for weight tensors are no longer frozen when caching weights across microbatches.

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>
Co-authored-by: Charlene Yang <charleney@nvidia.com>
@ksivaman ksivaman added the 1.6.0 label Apr 10, 2024
@ksivaman ksivaman requested a review from timmoon10 April 10, 2024 07:19
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
@ksivaman
Copy link
Member Author

/te-ci pytorch

@ksivaman ksivaman closed this Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant