Add Qwen3.5 MTP training support by rasdani · Pull Request #2406 · PrimeIntellect-ai/prime-rl

rasdani · 2026-05-03T20:05:34Z

Summary

This PR adds online MTP training support for Qwen3.5-family models and wires it into the RL/SFT training paths.

Main changes:

Adds trainer-side ModelConfig.mtp controls for auxiliary MTP CE loss, rollout enablement, loss scale, and speculative token count.
Adds shared MTP utilities for packed-sequence-safe token rolling and label-mask intersection.
Adds dense Qwen3.5 support through the HF Qwen3_5ForConditionalGeneration path, preserving official mtp.* checkpoint keys.
Extends PrimeRL Qwen3.5 MoE MTP modules, conversion, and HF round-trip behavior.
Adds MTP loss integration to RL and SFT training while keeping gradients isolated from trunk, embedding, and LM head parameters.
Adds NCCL broadcast preprocessing support for MTP weights and rejects quantized NCCL transfer when MTP weights are present.
Leaves Nemotron-H MTP as an explicit deferral while preserving the current drop behavior.
Adds a Qwen3.5-2B Hendrycks non-thinking sanity baseline config at configs/mtp_ablation/qwen35_2b_hendrycks_sanity_non_mtp_non_thinking.toml.

Experiment Commands

All commands should be run from the repo root with uv run. The sanity config assumes a two-GPU local run: GPU 0 for inference and GPU 1 for training.

Non-MTP Baseline

Foreground run:

uv run rl @ configs/mtp_ablation/qwen35_2b_hendrycks_sanity_non_mtp_non_thinking.toml

Detached run with logs:

mkdir -p outputs/qwen35-2b-hendrycks-sanity-non-mtp-non-thinking/launcher
setsid bash -lc 'exec uv run rl @ configs/mtp_ablation/qwen35_2b_hendrycks_sanity_non_mtp_non_thinking.toml' \
  > outputs/qwen35-2b-hendrycks-sanity-non-mtp-non-thinking/launcher/run.log 2>&1 < /dev/null &

Useful status checks:

ps -f -p <pid>
tail -n 80 outputs/qwen35-2b-hendrycks-sanity-non-mtp-non-thinking/logs/orchestrator.log
tail -n 80 outputs/qwen35-2b-hendrycks-sanity-non-mtp-non-thinking/logs/trainer.log
nvidia-smi --query-gpu=index,memory.used,utilization.gpu,power.draw --format=csv,noheader

MTP Rollout Ablation

Use the same baseline config, enabling MTP from the CLI so the only intentional experiment difference is MTP training plus speculative rollout:

uv run rl @ configs/mtp_ablation/qwen35_2b_hendrycks_sanity_non_mtp_non_thinking.toml \
  --trainer.model.mtp \
  --trainer.model.mtp.enable-rollout \
  --trainer.model.mtp.num-speculative-tokens 1 \
  --wandb.name qwen35-2b-mtp-non-thinking \
  --output-dir outputs/qwen35-2b-hendrycks-sanity-mtp-non-thinking

This resolves vLLM speculative decoding to:

[model.speculative_config]
method = "qwen3_next_mtp"
num_speculative_tokens = 1

Recommended sequence:

Let the non-MTP baseline run long enough to establish throughput and reward shape.
Stop it cleanly before launching the MTP ablation on the same two GPUs.
Compare rollout time, generation throughput, trainer throughput, reward, sequence length, and filter rates in W&B.

Validation

Dry-runs completed:

uv run rl @ configs/mtp_ablation/qwen35_2b_hendrycks_sanity_non_mtp_non_thinking.toml \
  --dry-run \
  --output-dir /tmp/qwen35-non-thinking-sanity-bs128-dry-run

uv run rl @ configs/mtp_ablation/qwen35_2b_hendrycks_sanity_non_mtp_non_thinking.toml \
  --trainer.model.mtp \
  --trainer.model.mtp.enable-rollout \
  --trainer.model.mtp.num-speculative-tokens 1 \
  --wandb.name qwen35-2b-mtp-non-thinking \
  --output-dir /tmp/qwen35-mtp-non-thinking-dry-run \
  --dry-run

Focused GPU unit tests still need to be run before merge:

uv run pytest \
  tests/unit/train/models/test_qwen3_5_dense_mtp.py \
  tests/unit/train/models/test_qwen3_5_moe_mtp.py \
  tests/unit/train/rl/test_nccl_broadcast.py::test_nccl_preprocess_converts_mtp_non_layer_chunk_to_hf_keys

rasdani added 4 commits May 3, 2026 17:14

feat: add qwen mtp training support

049a80a

feat: support dense qwen mtp training

e64465e

chore: add qwen non-thinking sanity config

aba9863

chore: reduce qwen sanity batch size

0e1410b

rasdani changed the title ~~[codex] Add Qwen3.5 MTP training support~~ Add Qwen3.5 MTP training support May 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3.5 MTP training support#2406

Add Qwen3.5 MTP training support#2406
rasdani wants to merge 4 commits intomainfrom
daniel/mtp-training

rasdani commented May 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rasdani commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Experiment Commands

Non-MTP Baseline

MTP Rollout Ablation

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rasdani commented May 3, 2026 •

edited

Loading