Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activation offloading to CPU's for the Linear, Layernorm Linear and the Layernorm MLP modules #571

Merged
merged 31 commits into from
Jan 21, 2024

Conversation

sanandaraj5597
Copy link
Contributor

This PR adds support for offloading all the tensors saved for backward pass by the Linear, Layernorm Linear and the Layernorm MLP modules except the weight tensors.

The cpu_offloading switch will be sent from the M-LM module construction which will enable calling the PyTorch hooks during tensor saving and retrieval. When these PyT hooks are called, the weight.main_grad isn't saved, so we save them separately on a need basis when we need to fuse gradients. All these layer executions will be done under a context which is placed at M-LM repo.

Please review and let me know if you have any questions.

Selvaraj Anandaraj and others added 3 commits December 17, 2023 20:50
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
@ptrendx ptrendx requested a review from ksivaman January 4, 2024 19:03
@ksivaman ksivaman self-assigned this Jan 10, 2024
@ptrendx
Copy link
Member

ptrendx commented Jan 11, 2024

/te-ci pytorch

@ptrendx
Copy link
Member

ptrendx commented Jan 11, 2024

Could you add some unit test for the functionality?

Selvaraj Anandaraj and others added 6 commits January 11, 2024 16:35
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
@ptrendx
Copy link
Member

ptrendx commented Jan 12, 2024

/te-ci pytorch

Copy link
Member

@ptrendx ptrendx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for unit test and then lgtm.

@ptrendx ptrendx added the 1.3.0 label Jan 16, 2024
Selvaraj Anandaraj and others added 5 commits January 18, 2024 14:56
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
@ksivaman
Copy link
Member

/te-ci pytorch

sanandaraj5597 and others added 6 commits January 19, 2024 09:36
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
@ksivaman
Copy link
Member

I've fixed most of the functional and lint errors and added some new tests as well as opposed to the singular test file that wasn't being used. The offloading for the TransformerLayer is failing with the following error due to the LayerNormMLP block, specifically during the offloading of weights.

RuntimeError: Attempting to use FunctionalTensor on its own. Instead, please use it with a corresponding FunctionalTensorMode()

@ksivaman
Copy link
Member

@sanandaraj5597 The above bug is a result of an attempted offload of a [4, 4] tensor, which seems unusual.
Additionally, the Linear and LayerNormLinear modules are not setting the weight_offloading attribute for the weight itself.

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
@sanandaraj5597
Copy link
Contributor Author

Fixed the issue you were seeing Kirthi. Please review. Thank you.

Selvaraj Anandaraj and others added 5 commits January 20, 2024 18:15
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Copy link
Member

@ksivaman ksivaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. FP8 cases were failing since the intermediate buffers for offloading and copying to the CPU were not Float8Tensor compatible.
  2. The inputs saved could be null tensors, thus assigning the activation_offloading attribute directly caused some issues.
  3. LayerNorm and RMSNorm should also both work now.

LGTM, CI pending.

@ksivaman
Copy link
Member

/te-ci pytorch

@ksivaman ksivaman merged commit f196d14 into NVIDIA:main Jan 21, 2024
9 of 10 checks passed
ptrendx added a commit that referenced this pull request Jan 22, 2024
…he Layernorm MLP modules (#571)

* Added support activation offloading to CPU's

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* Moving CPU offloading library to TE

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* Restructured code, added switch to choose between weight/activation offloading

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* Removed arg during constructor

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* Fix nit-pick errors

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* Documentation fixes

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Fix to the code block in docs

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Added offloading unit test

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* Fixed formatting

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* wgrad fusion fix, minor errors and lint

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Errors, test, lint

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* RM test file

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fixed stray PyT tensors in LayernormMLP getting offloaded

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* Fixed typi

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* Fix offloading for rmsnorm, rm test

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix errors

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Float8Tensor compatible offloading

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Cleanup

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Przemyslaw Tredak <ptredak@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants