Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Following #2523, this PR introduces an initial implementation of z-loss and includes some refactoring to make the loss configuration more flexible.
Z-Loss
I added z-loss support to the cross-entropy loss. The implementation is inspired by the one used in OLMo:
https://github.com/allenai/OLMo-core/blob/main/src/olmo_core/nn/cross_entropy_loss.py.
If
z_loss_weightis set to a value different from0, the z-loss is computed and added to the cross-entropy loss, scaled byz_loss_weight.Refactoring
I also refactored the loss configuration so it can be defined via the CLI or through TOML configuration files. This is just a proposal and can be adapted if it does not align with the Torchtitan design.
The current setup allows configuring multiple loss types (currently MSE and CrossEntropy). Each loss is defined as a configurable object with the following fields:
true, the loss module is compiled withtorch.compileAdditional options are available for specific losses:
CrossEntropyLoss
z_loss_weight(default:0.0)ignore_index(default:-100)Both CrossEntropy and MSE return a
LossOutputobject containing:main: the loss used for gradient computation (the one.backward()should be called on)aux: a dictionary containing auxiliary values intended only for loggingFor example, the CrossEntropy loss populates
LossOutput.auxwith both the unscaled z-loss and the raw cross-entropy loss.