-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix found issues of comm replay #182
Conversation
@shengfukevin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: fix replay to send/recv fix blocking option fix process group update of barrier Comment out unused function incast and multicast remove function extractCommsInfo() for deprecatd basic trace fix wait comm op for both collective and p2p Test Plan: comm_replay --trace-type et --trace-path /home/sanshang/lustre_storage/000_code/chakra/comm_replay_eval_06/training/et_iter_50_51 --num-replays 10 ## additional notes This is copied from #180. Differential Revision: D64312970 Pulled By: shengfukevin
094a2ab
to
115f337
Compare
@shengfukevin has updated the pull request. You must reimport the pull request before landing. |
This pull request was exported from Phabricator. Differential Revision: D64312970 |
self.waitObjIds = {} # mapping of reqID to future of async collectives | ||
self.waitObjIds = ( | ||
{} | ||
) # mapping of (pg_id, req_id, is_p2p) to future of async collectives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curComm.pgId, | ||
curComm.req[0], | ||
curComm.req[1], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
): | ||
self.collectiveArgs.waitObjIds[self.collectiveArgs.wait_obj_key] = ( | ||
retObj | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GSSBMW, the parenthesis is added by lint, basically to split one line into multiple lines. It does not change the syntax. |
@shengfukevin Ok. Then please check several ones I comment, whether it's necessary. Especially, only one element in the added parentheses. Thanks! |
The code change you commented on are all done by lint. So it should be ok. |
Summary: fix replay to send/recv fix blocking option fix process group update of barrier Comment out unused function incast and multicast remove function extractCommsInfo() for deprecatd basic trace fix wait comm op for both collective and p2p Test Plan: comm_replay --trace-type et --trace-path /home/sanshang/lustre_storage/000_code/chakra/comm_replay_eval_06/training/et_iter_50_51 --num-replays 10 ## additional notes This is copied from #180. Differential Revision: D64312970 Pulled By: shengfukevin
115f337
to
ec84f27
Compare
@shengfukevin has updated the pull request. You must reimport the pull request before landing. |
This pull request was exported from Phabricator. Differential Revision: D64312970 |
Then please merge. Thanks! |
@shengfukevin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@shengfukevin merged this pull request in 310035d. |
Summary
fix replay to send/recv
fix blocking option
fix process group update of barrier
Comment out unused function incast and multicast
remove function extractCommsInfo() for deprecatd basic trace
fix wait comm op for both collective and p2p
Test Plan
comm_replay --trace-type et --trace-path /home/sanshang/lustre_storage/000_code/chakra/comm_replay_eval_06/training/et_iter_50_51 --num-replays 10
additional notes
This is copied from #180.