Skip to content

Add Qwen3.5 MTP training support#2406

Draft
rasdani wants to merge 4 commits intomainfrom
daniel/mtp-training
Draft

Add Qwen3.5 MTP training support#2406
rasdani wants to merge 4 commits intomainfrom
daniel/mtp-training

Conversation

@rasdani
Copy link
Copy Markdown
Contributor

@rasdani rasdani commented May 3, 2026

Summary

This PR adds online MTP training support for Qwen3.5-family models and wires it into the RL/SFT training paths.

Main changes:

  • Adds trainer-side ModelConfig.mtp controls for auxiliary MTP CE loss, rollout enablement, loss scale, and speculative token count.
  • Adds shared MTP utilities for packed-sequence-safe token rolling and label-mask intersection.
  • Adds dense Qwen3.5 support through the HF Qwen3_5ForConditionalGeneration path, preserving official mtp.* checkpoint keys.
  • Extends PrimeRL Qwen3.5 MoE MTP modules, conversion, and HF round-trip behavior.
  • Adds MTP loss integration to RL and SFT training while keeping gradients isolated from trunk, embedding, and LM head parameters.
  • Adds NCCL broadcast preprocessing support for MTP weights and rejects quantized NCCL transfer when MTP weights are present.
  • Leaves Nemotron-H MTP as an explicit deferral while preserving the current drop behavior.
  • Adds a Qwen3.5-2B Hendrycks non-thinking sanity baseline config at configs/mtp_ablation/qwen35_2b_hendrycks_sanity_non_mtp_non_thinking.toml.

Experiment Commands

All commands should be run from the repo root with uv run. The sanity config assumes a two-GPU local run: GPU 0 for inference and GPU 1 for training.

Non-MTP Baseline

Foreground run:

uv run rl @ configs/mtp_ablation/qwen35_2b_hendrycks_sanity_non_mtp_non_thinking.toml

Detached run with logs:

mkdir -p outputs/qwen35-2b-hendrycks-sanity-non-mtp-non-thinking/launcher
setsid bash -lc 'exec uv run rl @ configs/mtp_ablation/qwen35_2b_hendrycks_sanity_non_mtp_non_thinking.toml' \
  > outputs/qwen35-2b-hendrycks-sanity-non-mtp-non-thinking/launcher/run.log 2>&1 < /dev/null &

Useful status checks:

ps -f -p <pid>
tail -n 80 outputs/qwen35-2b-hendrycks-sanity-non-mtp-non-thinking/logs/orchestrator.log
tail -n 80 outputs/qwen35-2b-hendrycks-sanity-non-mtp-non-thinking/logs/trainer.log
nvidia-smi --query-gpu=index,memory.used,utilization.gpu,power.draw --format=csv,noheader

MTP Rollout Ablation

Use the same baseline config, enabling MTP from the CLI so the only intentional experiment difference is MTP training plus speculative rollout:

uv run rl @ configs/mtp_ablation/qwen35_2b_hendrycks_sanity_non_mtp_non_thinking.toml \
  --trainer.model.mtp \
  --trainer.model.mtp.enable-rollout \
  --trainer.model.mtp.num-speculative-tokens 1 \
  --wandb.name qwen35-2b-mtp-non-thinking \
  --output-dir outputs/qwen35-2b-hendrycks-sanity-mtp-non-thinking

This resolves vLLM speculative decoding to:

[model.speculative_config]
method = "qwen3_next_mtp"
num_speculative_tokens = 1

Recommended sequence:

  1. Let the non-MTP baseline run long enough to establish throughput and reward shape.
  2. Stop it cleanly before launching the MTP ablation on the same two GPUs.
  3. Compare rollout time, generation throughput, trainer throughput, reward, sequence length, and filter rates in W&B.

Validation

Dry-runs completed:

uv run rl @ configs/mtp_ablation/qwen35_2b_hendrycks_sanity_non_mtp_non_thinking.toml \
  --dry-run \
  --output-dir /tmp/qwen35-non-thinking-sanity-bs128-dry-run
uv run rl @ configs/mtp_ablation/qwen35_2b_hendrycks_sanity_non_mtp_non_thinking.toml \
  --trainer.model.mtp \
  --trainer.model.mtp.enable-rollout \
  --trainer.model.mtp.num-speculative-tokens 1 \
  --wandb.name qwen35-2b-mtp-non-thinking \
  --output-dir /tmp/qwen35-mtp-non-thinking-dry-run \
  --dry-run

Focused GPU unit tests still need to be run before merge:

uv run pytest \
  tests/unit/train/models/test_qwen3_5_dense_mtp.py \
  tests/unit/train/models/test_qwen3_5_moe_mtp.py \
  tests/unit/train/rl/test_nccl_broadcast.py::test_nccl_preprocess_converts_mtp_non_layer_chunk_to_hf_keys

@rasdani rasdani changed the title [codex] Add Qwen3.5 MTP training support Add Qwen3.5 MTP training support May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant