Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix found issues of comm replay #182

Closed
wants to merge 1 commit into from
Closed

Conversation

shengfukevin
Copy link
Contributor

Summary

fix replay to send/recv
fix blocking option
fix process group update of barrier
Comment out unused function incast and multicast
remove function extractCommsInfo() for deprecatd basic trace
fix wait comm op for both collective and p2p

Test Plan

comm_replay --trace-type et --trace-path /home/sanshang/lustre_storage/000_code/chakra/comm_replay_eval_06/training/et_iter_50_51 --num-replays 10

additional notes

This is copied from #180.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 14, 2024
@facebook-github-bot
Copy link
Contributor

@shengfukevin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Oct 14, 2024
Summary:
fix replay to send/recv
fix blocking option
fix process group update of barrier
Comment out unused function incast and multicast
remove function extractCommsInfo() for deprecatd basic trace
fix wait comm op for both collective and p2p


Test Plan:
comm_replay --trace-type et --trace-path /home/sanshang/lustre_storage/000_code/chakra/comm_replay_eval_06/training/et_iter_50_51 --num-replays 10

## additional notes
This is copied from #180.

Differential Revision: D64312970

Pulled By: shengfukevin
@facebook-github-bot
Copy link
Contributor

@shengfukevin has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64312970

self.waitObjIds = {} # mapping of reqID to future of async collectives
self.waitObjIds = (
{}
) # mapping of (pg_id, req_id, is_p2p) to future of async collectives
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No parenthesis here.
image

curComm.pgId,
curComm.req[0],
curComm.req[1],
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed by lint rules?
image

):
self.collectiveArgs.waitObjIds[self.collectiveArgs.wait_obj_key] = (
retObj
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is parenthesis at 843 and 848 necessary?
image

@shengfukevin
Copy link
Contributor Author

@GSSBMW, the parenthesis is added by lint, basically to split one line into multiple lines. It does not change the syntax.

@GSSBMW
Copy link

GSSBMW commented Oct 16, 2024

@shengfukevin Ok. Then please check several ones I comment, whether it's necessary. Especially, only one element in the added parentheses. Thanks!

@shengfukevin
Copy link
Contributor Author

The code change you commented on are all done by lint. So it should be ok.

Summary:
fix replay to send/recv
fix blocking option
fix process group update of barrier
Comment out unused function incast and multicast
remove function extractCommsInfo() for deprecatd basic trace
fix wait comm op for both collective and p2p


Test Plan:
comm_replay --trace-type et --trace-path /home/sanshang/lustre_storage/000_code/chakra/comm_replay_eval_06/training/et_iter_50_51 --num-replays 10

## additional notes
This is copied from #180.

Differential Revision: D64312970

Pulled By: shengfukevin
@facebook-github-bot
Copy link
Contributor

@shengfukevin has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64312970

@GSSBMW
Copy link

GSSBMW commented Oct 17, 2024

Then please merge. Thanks!

@facebook-github-bot
Copy link
Contributor

@shengfukevin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@shengfukevin merged this pull request in 310035d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants