Additional states #458

yongyanrao · 2023-10-05T15:46:33Z

We noticed some additional states for each module, e.g.,

transformer.seq_layers.0.layer.self_attention.layernorm_qkv._extra_state
transformer.seq_layers.0.layer.self_attention.proj._extra_state
transformer.seq_layers.0.layer.layernorm_mlp._extra_state

And these states are empty binary strings b''. We are thinking these new states are related to fp8. How should we deal with them? Should we explicitly remove them? Or should we deal with them by some explicit methods?

The text was updated successfully, but these errors were encountered:

ptrendx · 2023-10-05T17:45:11Z

Why do you want to remove them? Those states are handled internally by Transformer Engine if FP8 is used.

Teng-xu · 2023-10-05T20:15:24Z

I am observing the same behavior during training without FP8, and I believe that these states are causing problems when attempting to load checkpoints into the model, especially when there is no "_extra_state" present in the checkpoint. Is there a method to deactivate or exclude these fields during training without FP8, given that they are all empty?

ksivaman · 2024-01-08T07:29:47Z

@Teng-xu @yongyanrao These extra states are indeed a part of the additional information needed for FP8 training checkpoint. These can be explicitly removed but the simplest method would be to load the checkpoint using the strict=False flag when using PyTorch's load state dict method.

zte-tcb · 2024-04-07T08:20:28Z

You can read _extra_state with code like this instead of state.read(). this can show _extra_state.

if isinstance(state, io.BytesIO):
    state.seek(0)
    state = torch.load(state)

ptrendx assigned ksivaman Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional states #458

Additional states #458

yongyanrao commented Oct 5, 2023

ptrendx commented Oct 5, 2023

Teng-xu commented Oct 5, 2023 •

edited

Loading

ksivaman commented Jan 8, 2024

zte-tcb commented Apr 7, 2024 •

edited

Loading

Additional states #458

Additional states #458

Comments

yongyanrao commented Oct 5, 2023

ptrendx commented Oct 5, 2023

Teng-xu commented Oct 5, 2023 • edited Loading

ksivaman commented Jan 8, 2024

zte-tcb commented Apr 7, 2024 • edited Loading

Teng-xu commented Oct 5, 2023 •

edited

Loading

zte-tcb commented Apr 7, 2024 •

edited

Loading