Share tensors between comp replay and comms replay #194

shengfukevin · 2025-01-03T19:26:12Z

Summary:
Share tensors between comp replay and comms replay

When run full replay in et_replay, compute replay and comms replay manage the tensor allocation separately. So some tensors are double allocated, this leads to the full replay of Llama4 70B out of memory.

This DIFF is to fix it by allocating tensors in comp replay and passes them to comms replay.

Reviewed By: sanrise

Differential Revision: D67353163

Summary: Share tensors between comp replay and comms replay When run full replay in et_replay, compute replay and comms replay manage the tensor allocation separately. So some tensors are double allocated, this leads to the full replay of Llama4 70B out of memory. This DIFF is to fix it by allocating tensors in comp replay and passes them to comms replay. Reviewed By: sanrise Differential Revision: D67353163

facebook-github-bot · 2025-01-03T19:26:47Z

This pull request was exported from Phabricator. Differential Revision: D67353163

facebook-github-bot · 2025-01-03T19:47:45Z

This pull request has been merged in c5f8d06.

shengfukevin requested review from kingchc, louisfeng, sunghlin, shengbao-zheng and briancoutinho as code owners January 3, 2025 19:26

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 3, 2025

facebook-github-bot added the fb-exported label Jan 3, 2025

facebook-github-bot closed this in c5f8d06 Jan 3, 2025

facebook-github-bot added the Merged label Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share tensors between comp replay and comms replay #194

Share tensors between comp replay and comms replay #194

shengfukevin commented Jan 3, 2025

facebook-github-bot commented Jan 3, 2025

facebook-github-bot commented Jan 3, 2025

Share tensors between comp replay and comms replay #194

Share tensors between comp replay and comms replay #194

Conversation

shengfukevin commented Jan 3, 2025

facebook-github-bot commented Jan 3, 2025

facebook-github-bot commented Jan 3, 2025