Skip to content

Configurable Z Loss#2576

Open
francesco-bertolotti wants to merge 4 commits intopytorch:mainfrom
francesco-bertolotti:f14-z-loss
Open

Configurable Z Loss#2576
francesco-bertolotti wants to merge 4 commits intopytorch:mainfrom
francesco-bertolotti:f14-z-loss

Conversation

@francesco-bertolotti
Copy link
Contributor

Overview

Following #2523, this PR introduces an initial implementation of z-loss and includes some refactoring to make the loss configuration more flexible.

Z-Loss

I added z-loss support to the cross-entropy loss. The implementation is inspired by the one used in OLMo:
https://github.com/allenai/OLMo-core/blob/main/src/olmo_core/nn/cross_entropy_loss.py.

If z_loss_weight is set to a value different from 0, the z-loss is computed and added to the cross-entropy loss, scaled by z_loss_weight.

Refactoring

I also refactored the loss configuration so it can be defined via the CLI or through TOML configuration files. This is just a proposal and can be adapted if it does not align with the Torchtitan design.

The current setup allows configuring multiple loss types (currently MSE and CrossEntropy). Each loss is defined as a configurable object with the following fields:

  • enable: whether the loss is active (exactly one loss must be enabled)
  • compile: if true, the loss module is compiled with torch.compile

Additional options are available for specific losses:

  • CrossEntropyLoss

    • z_loss_weight (default: 0.0)
    • ignore_index (default: -100)

Both CrossEntropy and MSE return a LossOutput object containing:

  • main: the loss used for gradient computation (the one .backward() should be called on)
  • aux: a dictionary containing auxiliary values intended only for logging

For example, the CrossEntropy loss populates LossOutput.aux with both the unscaled z-loss and the raw cross-entropy loss.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant