Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The version v1.2 has compatibility issues with PyTorch 2.0.1, causing errors during execution. #601

Closed
jindajia opened this issue Jan 14, 2024 · 4 comments · Fixed by #627
Assignees

Comments

@jindajia
Copy link

Hello,

I encountered the following error while using version v1.2 of Transformer Engine:

no_torch_dynamo = lambda recursive=True: lambda f: torch._dynamo.disable(f, recursive=recursive). The error message is TypeError: disable() got an unexpected keyword argument 'recursive'.

My environment setup is as follows: CUDA 11.8, PyTorch 2.0.1, and Python 3.10. I am experiencing this issue specifically in the context of using MegatronLM at commit fab0bd6 for Large Language Model (LLM) training. The error occurs right at the start of the training.

Possible cause of the issue:
The disable function in PyTorch 2.0.1 is defined as def disable(f), and the recursive parameter was introduced only in later versions starting from PyTorch 2.1.0.

@ptrendx ptrendx self-assigned this Jan 18, 2024
@ptrendx
Copy link
Member

ptrendx commented Jan 18, 2024

Thank you @jindajia for reporting this issue. I will work on a solution to this issue. In the meantime you can work around it by reverting commit 7e7f092

@anhdungitvn
Copy link

Thank you @jindajia for reporting this issue. I will work on a solution to this issue. In the meantime you can work around it by reverting commit 7e7f092

Hello @ptrendx ,

The commit [7e7f092] appears to conflict with transformers.

File "/usr/local/lib/python3.8/dist-packages/transformers/utils/import_utils.py", line 1354, in __getattr__
  module = self._get_module(self._class_to_module[name])
File "/usr/local/lib/python3.8/dist-packages/transformers/utils/import_utils.py", line 1366, in _get_module
  raise RuntimeError(
RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
Failed to import transformers.integrations.integration_utils because of the following error (look up to see its traceback):
disable() got an unexpected keyword argument 'recursive'

My env:

python 3.8
torch 2.0.1+cu118
transformer-engine-1.2.0.dev0+7e7f092
transformer 4.36.2/4.38.0.dev0

Thank you!

@ptrendx
Copy link
Member

ptrendx commented Jan 23, 2024

@jindajia and @anhdungitvn - could you test PR #627 to see if it fixes your issue? Thank you!

@jindajia
Copy link
Author

Thank you for your repair!! I will let you know if anything happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants