-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Activation offloading to CPU's for the Linear, Layernorm Linear and the Layernorm MLP modules #571
Conversation
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
…ffloading Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
/te-ci pytorch |
Could you add some unit test for the functionality? |
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
/te-ci pytorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting for unit test and then lgtm.
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
/te-ci pytorch |
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
I've fixed most of the functional and lint errors and added some new tests as well as opposed to the singular test file that wasn't being used. The offloading for the
|
@sanandaraj5597 The above bug is a result of an attempted offload of a |
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Fixed the issue you were seeing Kirthi. Please review. Thank you. |
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- FP8 cases were failing since the intermediate buffers for offloading and copying to the CPU were not
Float8Tensor
compatible. - The inputs saved could be null tensors, thus assigning the
activation_offloading
attribute directly caused some issues. - LayerNorm and RMSNorm should also both work now.
LGTM, CI pending.
/te-ci pytorch |
…he Layernorm MLP modules (#571) * Added support activation offloading to CPU's Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Moving CPU offloading library to TE Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Restructured code, added switch to choose between weight/activation offloading Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Removed arg during constructor Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Fix nit-pick errors Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Documentation fixes Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix to the code block in docs Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Added offloading unit test Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Fixed formatting Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * wgrad fusion fix, minor errors and lint Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Errors, test, lint Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * RM test file Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixed stray PyT tensors in LayernormMLP getting offloaded Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Fixed typi Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Fix offloading for rmsnorm, rm test Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix errors Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Float8Tensor compatible offloading Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Cleanup Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
This PR adds support for offloading all the tensors saved for backward pass by the Linear, Layernorm Linear and the Layernorm MLP modules except the weight tensors.
The cpu_offloading switch will be sent from the M-LM module construction which will enable calling the PyTorch hooks during tensor saving and retrieval. When these PyT hooks are called, the weight.main_grad isn't saved, so we save them separately on a need basis when we need to fuse gradients. All these layer executions will be done under a context which is placed at M-LM repo.
Please review and let me know if you have any questions.