You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In general CPUs make very poor PCI switches and we often find that A2A performance is bad when P2P is used across CPUs.
Hence, we normally disable P2P and bounce the communication via Host (SHM) buffers instead.
AllReduce puts less stress on the CPU interconnect, and we believe that is why it doesn't exhibit the same slowdown.
we activate the P2P on 4x4090, It's good when set NCCL_P2P_LEVEL=SYS on allreduce, compared to SHM, got 18->22 BUSBW;
Although in alltoall, compared to SHM, got 18 -> 2 BUSBW
Why ? we don't have PCI switch
The text was updated successfully, but these errors were encountered: