Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support
store_param_remainders
feature from Apex in TE Fused Adam #1408Support
store_param_remainders
feature from Apex in TE Fused Adam #1408Changes from 6 commits
9072c5f
7d5d0dc
bcaf16d
979a4c1
75b737c
887432d
1f51592
bf1a07b
d662d78
fa28b75
0dd4b79
0a257f3
98cf4fa
334e8cf
a4e46c9
0feb6f1
07f1479
5a4a91d
251b5ef
1d917cd
e10f6c3
30499d5
5c202ca
123ea3f
4d90bcc
ec76525
7a144ec
d08b814
f6cc32b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value of
store_param_remainders
isFalse
here, but it'sTrue
by default in the constructor. I think it's misleading, why not just set it toTrue
here as well?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to store param remainders for state_name other than master_params, that's why it's defaulted to false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if this function didn't expose this kwarg since it makes its behavior less obvious.
get_unscaled_state
implies that it produces an FP32 value that is ready to use, so it would be better ifstep
called a different function to access the BF16 remainder. If we want to keep this overall logic, we should change the function name to something more accurate (although a vague name likeget_state_for_adam_kernel
is a code smell).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worked around it. Resolving conversation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better, although we still have the problem that state scaling and BF16 remainders are both using this function in different ways. It's troubling that
get_unscaled_state
might not get the unscaled state.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree your point. But using this function both with/without feature makes it look very efficient. Writing a separate function needs new function usage across step function, checkpointing, etc.
I've also tried to add assert checks inside the function to tighten the understanding/correctness. Hope you are fine with it.