Make the LLaVA / LLaVA-Next test guard explicit by qgallouedec · Pull Request #5778 · huggingface/trl

qgallouedec · 2026-05-15T15:42:16Z

The VLM trainer tests assert every trainable parameter changes after trainer.train(). For LLaVA / LLaVA-Next some params are skipped with a vague:

# For some reason, these params are not updated. … We should investigate this further, but for now we just skip these params.

We investigated. Those params are not bugs: they are unreachable from the loss by design.

This PR turns the skip into a positive assertion that they stay frozen, and cleans up two spurious keys in the tiny-model generation scripts.

Why those params are frozen

LLaVA stitches three off-the-shelf pieces together:

Vision tower: a frozen, pretrained CLIP vision transformer (24 encoder layers in real LLaVA, 2 in our tiny version).
Multi-modal projector: two small Linear layers that map vision features into the LM's embedding dim.
Language model: a pretrained LLM (Vicuna for LLaVA-1.5, Mistral for LLaVA-Next).

At forward time: image → CLIP → pick one intermediate encoder layer's hidden states (vision_feature_layer=-2, the second-to-last) → drop the CLS token → projector → splice the result into the LM's input embeddings at the <image> placeholder.

Taking features from an intermediate layer rather than the final one is the reason the last encoder layer (and post_layernorm) never reach the loss.

flowchart LR
    img[Image] --> pe[patch_embed]
    pe --> l0[layers.0]
    l0 --> l1[layers.1]
    l1 --> dots["…"]
    dots --> l22[layers.22]
    l22 --> l23[layers.23]
    l23 --> pln[post_layernorm]
    pln -.->|"pooler_output<br/>(unused)"| X([ ])

    l22 ==>|"hidden_states[-2]"| drop["drop CLS token"]
    drop ==> proj[multi_modal_projector]
    proj ==> lm

    text[Text tokens] --> lm[Language Model]
    lm --> out[Output]

    subgraph vt[Vision tower — CLIP-ViT-Large, 24 layers]
        pe
        l0
        l1
        dots
        l22
        l23
        pln
    end

    style l23 stroke:#c33,stroke-dasharray:4 3
    style pln stroke:#c33,stroke-dasharray:4 3
    style X stroke:#c33,stroke-dasharray:4 3

Bold arrows = the actual path features take to the LLM.
Red dashed = computed every forward, but the result never reaches the loss (vision_feature_layer = -2, pooler unused).

Note

Low Risk
Low risk: updates are limited to test expectations and tiny-model generation configs, with no production training logic changes. Main risk is making CI stricter for LLaVA/LLaVA-Next, which could surface latent upstream model/config differences.

Overview
Updates VLM trainer tests (SFT/DPO/GRPO/RLOO) to stop vaguely skipping certain LLaVA/LLaVA-Next vision-tower parameters and instead positively assert that the last vision encoder layer and post_layernorm remain unchanged (frozen) after trainer.train().

Cleans up tiny LLaVA/LLaVA-Next model generation scripts by removing unused/invalid vision_config keys, and pins test runs to specific hub PR revisions for the updated tiny LLaVA models via tests/conftest.py’s MODEL_REVISIONS.

^{Reviewed by Cursor Bugbot for commit 79ac139. Bugbot is set up for automated code reviews on this repo. Configure here.}

…=-2`, and turn the test guard into an assertion

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84e9ccc368

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

HuggingFaceDocBuilderDev · 2026-05-15T15:45:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2026-05-15T15:52:58Z

-    "num_key_value_heads": 2,
-    "embed_dim": 64,


these keys aren't used, we can safely drop them

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

…ount for LLaVA design constraints

cursor

Cursor Bugbot has reviewed your changes using default mode and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 79ac139. Configure here.}

cursor · 2026-05-15T17:54:46Z

            new_param = trainer.model.get_parameter(n)
+            # LLaVA & LLaVA-Next: vision_feature_layer=-2 leaves the last encoder layer (layers.1 and post_layernorm
+            # without gradient by design. Assert they stay frozen — if they ever start training, the feature-selection
+            # plumbing has likely regressed.


Inconsistent comments across trainers violate AGENTS.md consistency rule

Low Severity

The LLaVA frozen-param comment in DPO/SFT tests differs from the identical logic in GRPO/RLOO tests. DPO/SFT uses "(layers.1 and post_layernorm" (with an unclosed parenthesis and extra trailing text about regression), while GRPO/RLOO uses "(layers.1) and post_layernorm (pooler-only path)" (properly punctuated). AGENTS.md requires word-for-word identical comments when the logic is identical.

Additional Locations (2)

tests/test_grpo_trainer.py#L1996-L1998

tests/test_sft_trainer.py#L573-L576

^{Triggered by project rule: ../.ai/AGENTS.md}

^{Reviewed by Cursor Bugbot for commit 79ac139. Configure here.}

Make tiny LLaVA / LLaVA-Next faithfully reflect `vision_feature_layer…

84e9ccc

…=-2`, and turn the test guard into an assertion

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

Comment thread tests/conftest.py

cursor Bot reviewed May 15, 2026

View reviewed changes

Comment thread tests/conftest.py

style

8be4414

qgallouedec commented May 15, 2026

View reviewed changes

Comment thread tests/test_dpo_trainer.py Outdated

Comment thread tests/test_sft_trainer.py Outdated

Comment thread tests/test_sft_trainer.py Outdated

Apply suggestions from code review

82cb9a3

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

qgallouedec requested review from AmineDiro, albertvillanova and kashif May 15, 2026 16:18

Refactor parameter change assertions in GRPO and RLOO trainers to acc…

79ac139

…ount for LLaVA design constraints

cursor Bot reviewed May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the LLaVA / LLaVA-Next test guard explicit#5778

Make the LLaVA / LLaVA-Next test guard explicit#5778
qgallouedec wants to merge 4 commits into
mainfrom
fix-llava

qgallouedec commented May 15, 2026 •

edited by cursor Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 15, 2026

Uh oh!

Uh oh!

qgallouedec May 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qgallouedec commented May 15, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why those params are frozen

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 15, 2026

Uh oh!

Uh oh!

qgallouedec May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 15, 2026

Choose a reason for hiding this comment

Inconsistent comments across trainers violate AGENTS.md consistency rule

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qgallouedec commented May 15, 2026 •

edited by cursor Bot

Loading