Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Why Tensor parallel communication/GEMM overlap can happen only when sequence parallelism is enabled? #746

Open
hxdtest opened this issue Apr 3, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@hxdtest
Copy link

hxdtest commented Apr 3, 2024

In Megatron, I find that the check for tp_comm_overlap and sequence_parallel

if args.tp_comm_overlap:         
        assert args.sequence_parallel == True, 'Tensor parallel communication/GEMM overlap can happen only when sequence parallelism is enabled'

But why?

@ptrendx
Copy link
Member

ptrendx commented Apr 9, 2024

That is because we currently only support AllGather/ReduceScatter overlapping with GEMM (and those communication types are used when sequence parallelism is enabled, as opposed to AllReduce which is being used in the other cases).

@ptrendx ptrendx added enhancement New feature or request labels May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants