Skip to content

Commit

Permalink
[FIX] Improve the Warning Displayed for Uneven Batch Distributions (#…
Browse files Browse the repository at this point in the history
…4920)

In order to better debug issues with batch distribution, this PR updates the warning message to show how many batches each rank received.  Partially resolves rapidsai/cugraph-gnn#130.

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)

Approvers:
  - Brad Rees (https://github.com/BradReesWork)

URL: #4920
  • Loading branch information
alexbarghi-nv authored Feb 11, 2025
1 parent 07840d9 commit 6e5ca5a
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion python/cugraph/cugraph/gnn/data_loading/dist_sampler.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,8 @@ def get_start_batch_offset(
warnings.warn(
"Not all ranks received the same number of batches. "
"This might cause your training loop to hang "
"due to uneven inputs."
"due to uneven inputs. This is the number of "
f"batches receieved on each rank: {t.tolist()}."
)

return (0 if rank == 0 else t.cumsum(dim=0)[rank - 1], input_size_is_equal)
Expand Down

0 comments on commit 6e5ca5a

Please sign in to comment.