-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional states #458
Comments
Why do you want to remove them? Those states are handled internally by Transformer Engine if FP8 is used. |
I am observing the same behavior during training without FP8, and I believe that these states are causing problems when attempting to load checkpoints into the model, especially when there is no "_extra_state" present in the checkpoint. Is there a method to deactivate or exclude these fields during training without FP8, given that they are all empty? |
@Teng-xu @yongyanrao These extra states are indeed a part of the additional information needed for FP8 training checkpoint. These can be explicitly removed but the simplest method would be to load the checkpoint using the |
You can read _extra_state with code like this instead of state.read(). this can show _extra_state. if isinstance(state, io.BytesIO):
state.seek(0)
state = torch.load(state) |
We noticed some additional states for each module, e.g.,
And these states are empty binary strings
b''
. We are thinking these new states are related to fp8. How should we deal with them? Should we explicitly remove them? Or should we deal with them by some explicit methods?The text was updated successfully, but these errors were encountered: