Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Paddle] Optimize memory usage when training in pipeline parallel #580

Merged
merged 8 commits into from
Jan 12, 2024

Conversation

Tom-Zheng
Copy link
Contributor

Note: Merge #561 before this one.

This PR adds the following optimization:

  • Actively delete tensor to free memory in FP8 linear backward
  • Support FP8 weight caching in pipeline parallel

@Tom-Zheng Tom-Zheng marked this pull request as draft December 27, 2023 09:51
@Wong4j Wong4j mentioned this pull request Jan 7, 2024
Signed-off-by: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
- Allow skipping weight update in fp8 meta update

Signed-off-by: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
Signed-off-by: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
Signed-off-by: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
Signed-off-by: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
Signed-off-by: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
Signed-off-by: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
@Tom-Zheng Tom-Zheng force-pushed the tizheng/optimize_memory branch from f071572 to 9e54901 Compare January 11, 2024 05:31
@Tom-Zheng Tom-Zheng marked this pull request as ready for review January 11, 2024 05:32
@zlsh80826
Copy link
Collaborator

/te-ci paddle

Signed-off-by: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
@zlsh80826
Copy link
Collaborator

/te-ci paddle

@Tom-Zheng
Copy link
Contributor Author

Tom-Zheng commented Jan 12, 2024

@jeng1220 Ready for review.

@jeng1220
Copy link
Contributor

@timmoon10 and @ksivaman ,
All tests were passed. Could you please merge this PR?
Thanks

Copy link
Collaborator

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@timmoon10 timmoon10 merged commit daad219 into NVIDIA:main Jan 12, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants