You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 1, 2024. It is now read-only.
Enhancement (credit to @rohan-varma):
"this can be done in a follow up PR, but let's maybe consider not defaulting things to torch.bfloat16 eventually. this is because it might be good to make this optimizer usable out of the box with the defaults on all HW architectures, but only A100 supports bfloat16 well at the moment.
But the downside here would be that the default optimizer won't be too interesting, it'd just be AdamW"
Possible option to accomplish this would be a simple bf16 native support check, and then revert any BF16 defaults to FP32 (and turn off Kahan as well since it would not add benefit).
Downside dilemma is if you should warn user about this change - positive they know they are not getting BF16 benefits, negative is they may have been aware and don't enjoy one line warning * 128 gpus.
Enhancement (credit to @rohan-varma):
"this can be done in a follow up PR, but let's maybe consider not defaulting things to torch.bfloat16 eventually. this is because it might be good to make this optimizer usable out of the box with the defaults on all HW architectures, but only A100 supports bfloat16 well at the moment.
But the downside here would be that the default optimizer won't be too interesting, it'd just be AdamW"
Possible option to accomplish this would be a simple bf16 native support check, and then revert any BF16 defaults to FP32 (and turn off Kahan as well since it would not add benefit).
Downside dilemma is if you should warn user about this change - positive they know they are not getting BF16 benefits, negative is they may have been aware and don't enjoy one line warning * 128 gpus.