Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly disable P2P using launch, and pick up in state if a user will face issues. #2195

Merged
merged 7 commits into from
Nov 29, 2023

Conversation

muellerzr
Copy link
Collaborator

What does this PR do?

This PR introduces checks to ensure that an exceedingly common problem in Accelerate issues can be automatically solved and/or a solution provided. On RTX 3090+, infinity-band and peer-to-peer communication was removed (so the entirety of the 4000 series and beyond). As a result, a large majority of issues opened up about hitting timeouts are related to the fact nvidia drivers do not automatically disable them. Thus, we should do so in Accelerate.

Fixes #2174
Fixes #2183
Fixes huggingface/diffusers#5923

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@BenjaminBossan @pacman100

@muellerzr muellerzr added enhancement New feature or request GPU Bug or feature on GPU or MultiGPU platforms labels Nov 28, 2023
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 28, 2023

The documentation is not available anymore as the PR was closed or merged.

@muellerzr muellerzr requested a review from SunMarc November 28, 2023 18:32
Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, thanks. I've got some comments, but none are blockers.

src/accelerate/commands/launch.py Outdated Show resolved Hide resolved
src/accelerate/commands/launch.py Show resolved Hide resolved
src/accelerate/utils/environment.py Show resolved Hide resolved
src/accelerate/utils/environment.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request GPU Bug or feature on GPU or MultiGPU platforms
Projects
None yet
3 participants