Skip to content

Remove custom DistillationProvider and simplify mbridge distillation and hf export#1122

Open
kevalmorabia97 wants to merge 5 commits intofeature/puzzletronfrom
kmorabia/puzzletron-remove-distillation-provider
Open

Remove custom DistillationProvider and simplify mbridge distillation and hf export#1122
kevalmorabia97 wants to merge 5 commits intofeature/puzzletronfrom
kmorabia/puzzletron-remove-distillation-provider

Conversation

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 commented Mar 25, 2026

  • In nemo:26.02.01 container, we have DistillationProvider fix for homogeneous models already. That seems sufficient for Heterogeneous models as well so removing copied DistillationProvider to simplify
  • Replace hacky megatron to hf export logic with simplified one

Summary by CodeRabbit

  • Refactor
    • Reworked distillation and HuggingFace export flow to use upstream bridge/export APIs, removed local monkey-patching and extra exception logging, and simplified distributed cleanup.
  • Chores
    • Consolidated and renamed Qwen3 / Qwen3-VL model and converter registrations; updated pruning configs, CLI export flags, and packaging lint/dependency settings.
  • Tests
    • Updated integration tests to use Qwen3 checkpoints and adjusted export verification.
  • Documentation
    • Updated README example to reflect new CLI usage.

@kevalmorabia97 kevalmorabia97 requested review from a team as code owners March 25, 2026 19:13
@kevalmorabia97 kevalmorabia97 requested a review from realAsma March 25, 2026 19:13
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Removed local Megatron-Bridge distillation/export helpers and monkey-patching; switched to upstream Megatron-Bridge APIs for distillation and HF export. Renamed multiple Qwen3/Qwen3-VL converter and model descriptor classes; updated pruning configs, AnyModel imports, examples, and tests to match. Miscellaneous small cleanups and lint/dependency edits.

Changes

Cohort / File(s) Summary
Distillation example & tests
examples/puzzletron/mbridge_distillation/distill_hf.py, examples/puzzletron/mbridge_distillation/README.md, tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
Removed local monkey-patching and local HF-export helper; now use megatron.bridge imports and AutoBridge.from_hf_pretrained(...).export_ckpt(...). CLI arg changes (removed required --hf_model), tests updated to Qwen3 checkpoints and Path usage.
Deleted mbridge utilities
modelopt/torch/puzzletron/export/mbridge/distillation_provider.py (deleted), modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py (deleted)
Removed local DistillationProvider implementation and export_to_hf_and_copy_config; upstream Megatron-Bridge provider/bridge used instead.
AnyModel package index
modelopt/torch/puzzletron/anymodel/models/__init__.py
Switched top-level imports to load qwen3 and qwen3_vl modules (replacing older 8B/30B-specific modules) so updated factories register on import.
Qwen3 renames
modelopt/torch/puzzletron/anymodel/models/qwen3/__init__.py, .../qwen3/qwen3_converter.py, .../qwen3/qwen3_model_descriptor.py
Renamed converter and descriptor classes from 8B-specific names to generalized Qwen3* identifiers and updated module exports/registrations.
Qwen3‑VL renames & cleanup
modelopt/torch/puzzletron/anymodel/models/qwen3_vl/__init__.py, .../qwen3_vl/qwen3_vl_converter.py, .../qwen3_vl/qwen3_vl_model_descriptor.py
Renamed VL converter/descriptor and several layer descriptor dataclasses to Qwen3VL*; removed an unused torch.nn import and tightened a default factory.
Pruning config updates
examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/ffn_pruning.yaml, tests/gpu/.../Qwen3-8B/pruning/ffn_pruning.yaml, tests/gpu/.../Qwen3-VL-30B-A3B-Instruct/pruning/expert_pruning.yaml
Updated _target_ entries to reference the renamed/generalized layer descriptor classes and made minor inline-comment spacing tweaks.
Typing & small cleanups
modelopt/torch/puzzletron/sewing_kit/utils.py, modelopt/torch/puzzletron/sewing_kit/core.py, modelopt/torch/puzzletron/mip/run_puzzle.py, modelopt/torch/puzzletron/mip/utils.py, modelopt/torch/puzzletron/anymodel/models/nemotron_h/..., .../nemotron_h_v2/..., .../qwen2/..., modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py, modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
Minor edits: removed redundant pass statements, switched runtime imports to TYPE_CHECKING, adjusted typing casts, replaced exit(1) with sys.exit(1), and simplified an assertion.
Build metadata / lint config
pyproject.toml
Removed exact omegaconf==2.3.0 pin from puzzletron extras and removed several Ruff ignore codes from extend-ignore and per-file ignore lists.
Test resources
tests/gpu/torch/puzzletron/resources/configs/...
Mirrored pruning config updates in test resources to reference renamed descriptor classes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main changes: removing the custom DistillationProvider and simplifying mbridge distillation/HF export logic.
Security Anti-Patterns ✅ Passed Comprehensive security scan found no violations of security anti-patterns: no unsafe torch.load/numpy.load calls, no hardcoded trust_remote_code=True, no eval/exec on external input, and no nosec comments.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kmorabia/puzzletron-remove-distillation-provider

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 25, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1122/

Built to branch gh-pages at 2026-04-01 09:23 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
examples/puzzletron/mbridge_distillation/distill_hf.py (1)

29-29: Make Puzzletron bridge registration explicit.

After switching these imports to upstream Megatron-Bridge, the script now relies on the unrelated export_to_hf_and_copy_config import at Line 49 to pull in modelopt.torch.puzzletron.export.mbridge.__init__, which is what registers the Puzzletron adapters. That hidden dependency is easy to break later by moving the export helper behind --hf_export_path or removing it. Please register the mbridge package explicitly near provider setup so AutoBridge.from_hf_pretrained() does not depend on an export-only import.

♻️ One way to make the dependency explicit
 import argparse
+import importlib
 import os
 import traceback
@@
 def main(args: argparse.Namespace):
+    # Register Puzzletron Megatron-Bridge adapters before resolving HF providers.
+    importlib.import_module("modelopt.torch.puzzletron.export.mbridge")
+
     checkpoint_dir = os.path.join(args.output_dir, "checkpoints")

Also applies to: 43-43

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/mbridge_distillation/distill_hf.py` at line 29, The
script currently relies on a side-effect import (export_to_hf_and_copy_config)
to register the Puzzletron mbridge adapters, which is brittle; explicitly import
the mbridge registration module near the provider setup so
AutoBridge.from_hf_pretrained() sees the adapters. Add a direct import of
modelopt.torch.puzzletron.export.mbridge (or its __init__) alongside
convert_to_distillation_provider before calling convert_to_distillation_provider
/ AutoBridge.from_hf_pretrained to ensure Puzzletron adapters are registered
regardless of export helper usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Line 29: The script currently relies on a side-effect import
(export_to_hf_and_copy_config) to register the Puzzletron mbridge adapters,
which is brittle; explicitly import the mbridge registration module near the
provider setup so AutoBridge.from_hf_pretrained() sees the adapters. Add a
direct import of modelopt.torch.puzzletron.export.mbridge (or its __init__)
alongside convert_to_distillation_provider before calling
convert_to_distillation_provider / AutoBridge.from_hf_pretrained to ensure
Puzzletron adapters are registered regardless of export helper usage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ddbcd8d7-8874-4595-8696-cbb5fb8f2ebe

📥 Commits

Reviewing files that changed from the base of the PR and between c5ec50b and 3358605.

📒 Files selected for processing (2)
  • examples/puzzletron/mbridge_distillation/distill_hf.py
  • modelopt/torch/puzzletron/export/mbridge/distillation_provider.py
💤 Files with no reviewable changes (1)
  • modelopt/torch/puzzletron/export/mbridge/distillation_provider.py

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.15%. Comparing base (2a170b9) to head (c5447d2).
⚠️ Report is 3 commits behind head on feature/puzzletron.

Additional details and impacted files
@@                  Coverage Diff                   @@
##           feature/puzzletron    #1122      +/-   ##
======================================================
- Coverage               70.17%   70.15%   -0.03%     
======================================================
  Files                     230      230              
  Lines                   26057    26057              
======================================================
- Hits                    18285    18279       -6     
- Misses                   7772     7778       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97 kevalmorabia97 requested a review from a team as a code owner March 25, 2026 21:05
@kevalmorabia97 kevalmorabia97 changed the title Remove custom DistillationProvider as not needed Remove custom DistillationProvider and simplify mbridge distillation and hf export Mar 25, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/puzzletron/mbridge_distillation/README.md (1)

92-92: ⚠️ Potential issue | 🟡 Minor

Stale documentation: --hf-model / --hf_model argument no longer exists.

This line still references --hf-model / --hf_model which was removed from the script. The documentation should be updated to reflect that only --hf_export_path is needed for export, and the student model architecture is inferred from --student_hf_path.

📝 Suggested fix
-- Add `--hf-export-path` (or `--hf_export_path`) to automatically export the final checkpoint to HuggingFace format after distillation. When exporting, you must also provide `--hf-model` / `--hf_model` as the HuggingFace model ID for the export template (e.g., `meta-llama/Llama-3.1-8B-Instruct`). It should match the base architecture of the student model. The exported model can be evaluated for accuracy using the evaluation tools described in the main [README.md](../README.md#evaluation).
+- Add `--hf_export_path` to automatically export the final checkpoint to HuggingFace format after distillation. The export uses the student model architecture from `--student_hf_path`. The exported model can be evaluated for accuracy using the evaluation tools described in the main [README.md](../README.md#evaluation).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/mbridge_distillation/README.md` at line 92, Update the
README line that mentions export flags to remove the obsolete `--hf-model` /
`--hf_model` parameter and clarify that only `--hf-export-path` (or
`--hf_export_path`) is required to export the final checkpoint to HuggingFace
format; also state that the student model architecture is inferred from the
provided `--student_hf_path` rather than requiring a separate HuggingFace model
ID, and ensure any example invocation and the sentence about evaluation still
point readers to the main README's evaluation section.
🧹 Nitpick comments (1)
examples/puzzletron/mbridge_distillation/distill_hf.py (1)

270-270: Clarify why subblocks_safetensors/ directory is pre-created.

The purpose of creating subblocks_safetensors/ manually is unclear. If this is a workaround for an AutoBridge.export_ckpt limitation, consider adding a brief comment explaining why it's needed.

💡 Suggested clarification
         bridge = AutoBridge.from_hf_pretrained(
             args.student_hf_path, trust_remote_code=args.trust_remote_code
         )
+        # Create subblocks_safetensors dir required by AutoBridge.export_ckpt
         os.makedirs(os.path.join(args.hf_export_path, "subblocks_safetensors"), exist_ok=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/mbridge_distillation/distill_hf.py` at line 270, Add a
brief explanatory comment immediately above the os.makedirs call that creates
"subblocks_safetensors" in distill_hf.py describing that this pre-creation is a
deliberate workaround for AutoBridge.export_ckpt (or the downstream export
routine) which expects that directory to already exist when it writes subblock
safetensor files; mention the function name AutoBridge.export_ckpt and the
reason (preventing a missing-directory error during export) so future
maintainers know why the directory is created and can remove the workaround if
export behavior is fixed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Around line 256-276: Add a pre-check and clear error handling before calling
bridge.export_ckpt: verify the expected checkpoint path (checkpoint_dir +
f"iter_{args.train_iters:07d}") exists when args.hf_export_path is set and only
proceed with dist.cleanup()/bridge.export_ckpt if the path is present; otherwise
raise or log a descriptive error mentioning the missing checkpoint path and the
expected location so users know to check train_iters/save_interval. Wrap the
export call (bridge.export_ckpt) in a try/except to catch exceptions and emit a
helpful message including the checkpoint path and original exception before
re-raising or exiting.

In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py`:
- Line 35: The test currently overwrites the pytest-provided tmp_path with a
hardcoded Path("/tmp/test_distill_hf"); revert to using the fixture-provided
tmp_path variable instead (remove the assignment to tmp_path and use the
existing tmp_path in test_distill_hf.py) or, if durable artifacts are required,
switch to tmp_path_factory to create a named directory; update any references in
the test to use the fixture tmp_path (or tmp_path_factory-created path) so test
isolation and cleanup are preserved.

---

Outside diff comments:
In `@examples/puzzletron/mbridge_distillation/README.md`:
- Line 92: Update the README line that mentions export flags to remove the
obsolete `--hf-model` / `--hf_model` parameter and clarify that only
`--hf-export-path` (or `--hf_export_path`) is required to export the final
checkpoint to HuggingFace format; also state that the student model architecture
is inferred from the provided `--student_hf_path` rather than requiring a
separate HuggingFace model ID, and ensure any example invocation and the
sentence about evaluation still point readers to the main README's evaluation
section.

---

Nitpick comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Line 270: Add a brief explanatory comment immediately above the os.makedirs
call that creates "subblocks_safetensors" in distill_hf.py describing that this
pre-creation is a deliberate workaround for AutoBridge.export_ckpt (or the
downstream export routine) which expects that directory to already exist when it
writes subblock safetensor files; mention the function name
AutoBridge.export_ckpt and the reason (preventing a missing-directory error
during export) so future maintainers know why the directory is created and can
remove the workaround if export behavior is fixed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 50bcee85-fd58-4bac-aed8-960d6c3c234d

📥 Commits

Reviewing files that changed from the base of the PR and between 0e076e7 and 6d45b3a.

📒 Files selected for processing (5)
  • examples/puzzletron/mbridge_distillation/README.md
  • examples/puzzletron/mbridge_distillation/distill_hf.py
  • modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py
  • pyproject.toml
  • tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
💤 Files with no reviewable changes (2)
  • modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py
  • pyproject.toml

@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/puzzletron-remove-distillation-provider branch 2 times, most recently from 64d4099 to cd83b43 Compare March 25, 2026 21:40
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
examples/puzzletron/mbridge_distillation/distill_hf.py (1)

256-277: ⚠️ Potential issue | 🟡 Minor

Validate the final checkpoint before tearing down distributed state.

Line 273 assumes iter_{args.train_iters:07d} exists. If that directory is missing, the script tears down the process group and then fails deep inside the export path with a much less actionable error. Check the checkpoint directory first on all ranks, then wrap the rank-0 export with a targeted exception.

🐛 Proposed fix
     # Export to HuggingFace format if hf_export_path is provided
     if args.hf_export_path:
+        export_ckpt_dir = os.path.join(checkpoint_dir, f"iter_{args.train_iters:07d}")
+        if not os.path.isdir(export_ckpt_dir):
+            raise FileNotFoundError(
+                "Expected distilled checkpoint at "
+                f"{export_ckpt_dir} before HF export. "
+                "Check --train_iters / --eval_interval and confirm training completed."
+            )
+
         # Save rank before destroying process group (dist.rank() won't work after destruction)
         is_rank_0 = dist.rank() == 0
 
         # Destroy process group on all ranks - export_ckpt will create its own temporary one
         # This prevents cleanup from hanging (cleanup tries to barrier, but rank 0 would be gone)
@@
         if is_rank_0:
             bridge = AutoBridge.from_hf_pretrained(
                 args.student_hf_path, trust_remote_code=args.trust_remote_code
             )
             # Create subblocks_safetensors directory else safetensors saving will fail
             os.makedirs(os.path.join(args.hf_export_path, "subblocks_safetensors"), exist_ok=True)
-            bridge.export_ckpt(
-                megatron_path=f"{checkpoint_dir}/iter_{args.train_iters:07d}",
-                hf_path=args.hf_export_path,
-                show_progress=True,
-                strict=True,
-            )
+            try:
+                bridge.export_ckpt(
+                    megatron_path=export_ckpt_dir,
+                    hf_path=args.hf_export_path,
+                    show_progress=True,
+                    strict=True,
+                )
+            except Exception as exc:
+                raise RuntimeError(
+                    f"Failed to export HF checkpoint from {export_ckpt_dir} "
+                    f"to {args.hf_export_path}"
+                ) from exc
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/mbridge_distillation/distill_hf.py` around lines 256 -
277, Before calling dist.cleanup(), verify that the final checkpoint directory
(f"{checkpoint_dir}/iter_{args.train_iters:07d}") exists on all ranks (e.g., use
os.path.isdir and a collective check) and abort early with a clear log if
missing; then after teardown, keep the rank-0 export
(AutoBridge.from_hf_pretrained + bridge.export_ckpt) wrapped in a targeted
try/except that logs the exact exception and context (checkpoint path,
args.train_iters, hf_export_path) and re-raises or exits cleanly so failures
during bridge.export_ckpt produce actionable messages rather than failing
silently after process group destruction.
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py (1)

35-35: ⚠️ Potential issue | 🟠 Major

Keep the pytest-provided tmp_path.

Line 35 replaces pytest's isolated temp dir with a shared /tmp/test_distill_hf. That makes reruns and parallel jobs share checkpoints/export artifacts, so this test can pass or fail based on stale files instead of the current run.

🐛 Proposed fix
-    tmp_path = Path("/tmp/test_distill_hf")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py` at line
35, The test overwrites the pytest-provided tmp_path by assigning tmp_path =
Path("/tmp/test_distill_hf"), causing shared state between runs; remove that
assignment so the pytest tmp_path fixture is used (or, if you need a named
subdirectory, replace the assignment with tmp_path = tmp_path /
"test_distill_hf" and create it), i.e., locate the tmp_path assignment in
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py and stop
overriding the tmp_path fixture.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Around line 256-277: Before calling dist.cleanup(), verify that the final
checkpoint directory (f"{checkpoint_dir}/iter_{args.train_iters:07d}") exists on
all ranks (e.g., use os.path.isdir and a collective check) and abort early with
a clear log if missing; then after teardown, keep the rank-0 export
(AutoBridge.from_hf_pretrained + bridge.export_ckpt) wrapped in a targeted
try/except that logs the exact exception and context (checkpoint path,
args.train_iters, hf_export_path) and re-raises or exits cleanly so failures
during bridge.export_ckpt produce actionable messages rather than failing
silently after process group destruction.

In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py`:
- Line 35: The test overwrites the pytest-provided tmp_path by assigning
tmp_path = Path("/tmp/test_distill_hf"), causing shared state between runs;
remove that assignment so the pytest tmp_path fixture is used (or, if you need a
named subdirectory, replace the assignment with tmp_path = tmp_path /
"test_distill_hf" and create it), i.e., locate the tmp_path assignment in
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py and stop
overriding the tmp_path fixture.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dd51ba9d-fc48-4011-8d50-aa1cb63cd88d

📥 Commits

Reviewing files that changed from the base of the PR and between 6d45b3a and 64d4099.

📒 Files selected for processing (15)
  • examples/puzzletron/mbridge_distillation/README.md
  • examples/puzzletron/mbridge_distillation/distill_hf.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py
  • modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py
  • modelopt/torch/puzzletron/mip/run_puzzle.py
  • modelopt/torch/puzzletron/mip/utils.py
  • modelopt/torch/puzzletron/sewing_kit/core.py
  • modelopt/torch/puzzletron/sewing_kit/utils.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
  • pyproject.toml
  • tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
💤 Files with no reviewable changes (8)
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/mip/utils.py
  • modelopt/torch/puzzletron/sewing_kit/core.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
  • pyproject.toml
  • modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py
✅ Files skipped from review due to trivial changes (1)
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/puzzletron/mbridge_distillation/README.md

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py (1)

118-145: ⚠️ Potential issue | 🟠 Major

This fixture still doesn’t exercise the heterogeneous distillation path.

Both checkpoints are created from the same Qwen/Qwen3-0.6B template and both keep hybrid_override_pattern=None, so after convert_model(...) this test is still validating a homogeneous setup. That leaves the PR’s key claim—removing the custom DistillationProvider is still safe for heterogeneous models—untested here. Please make one side genuinely heterogeneous before conversion (for example by varying per-layer FFN or block configs).

As per coding guidelines, "Write tests using pytest for all new features and examples; organize tests into tests/unit (fast CPU-based), tests/gpu (fast GPU-based), tests/gpu_megatron (Megatron-Core), tests/gpu_trtllm (TensorRT-LLM), and tests/examples (integration tests)" and "All test coverage checks in PRs must pass for new features and examples."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py` around
lines 118 - 145, The test currently creates two identical checkpoints (both via
create_and_save_small_hf_model with hf_model_name="Qwen/Qwen3-0.6B" and
hybrid_override_pattern=None) so it never exercises heterogeneous distillation;
update the fixture so one side is genuinely different before conversion (e.g.,
call create_and_save_small_hf_model for teacher_hf_dir with a different
hf_model_name or supply a non-None hybrid_override_pattern that alters per-layer
FFN/block configs), keep student_hf_dir unchanged, then run convert_model on
both student_hf_dir and teacher_hf_dir as before so convert_model and subsequent
distillation see heterogeneous inputs (refer to create_and_save_small_hf_model,
hybrid_override_pattern, convert_model, student_hf_dir, teacher_hf_dir).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py`:
- Around line 118-145: The test currently creates two identical checkpoints
(both via create_and_save_small_hf_model with hf_model_name="Qwen/Qwen3-0.6B"
and hybrid_override_pattern=None) so it never exercises heterogeneous
distillation; update the fixture so one side is genuinely different before
conversion (e.g., call create_and_save_small_hf_model for teacher_hf_dir with a
different hf_model_name or supply a non-None hybrid_override_pattern that alters
per-layer FFN/block configs), keep student_hf_dir unchanged, then run
convert_model on both student_hf_dir and teacher_hf_dir as before so
convert_model and subsequent distillation see heterogeneous inputs (refer to
create_and_save_small_hf_model, hybrid_override_pattern, convert_model,
student_hf_dir, teacher_hf_dir).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 91a932d2-8d5f-4071-9418-bf2c9352d03d

📥 Commits

Reviewing files that changed from the base of the PR and between 64d4099 and cd83b43.

📒 Files selected for processing (12)
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py
  • modelopt/torch/puzzletron/mip/run_puzzle.py
  • modelopt/torch/puzzletron/mip/utils.py
  • modelopt/torch/puzzletron/sewing_kit/core.py
  • modelopt/torch/puzzletron/sewing_kit/utils.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
  • pyproject.toml
  • tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
💤 Files with no reviewable changes (7)
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
  • modelopt/torch/puzzletron/mip/utils.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
  • modelopt/torch/puzzletron/sewing_kit/core.py
  • pyproject.toml
✅ Files skipped from review due to trivial changes (2)
  • modelopt/torch/puzzletron/mip/run_puzzle.py
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py

@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/puzzletron-remove-distillation-provider branch from cd83b43 to 3c32b83 Compare March 30, 2026 08:23
@kevalmorabia97 kevalmorabia97 removed the request for review from realAsma March 30, 2026 08:24
…and hf export

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…tillation-provider

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/puzzletron-remove-distillation-provider branch from 3a10029 to 24a8066 Compare March 30, 2026 15:13
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary: Removes the local DistillationProvider monkey-patch and export_mbridge_to_hf helper, replacing them with upstream Megatron-Bridge APIs (convert_to_distillation_provider from megatron.bridge, AutoBridge.export_ckpt). Also simplifies CLI args and error handling.

Issues Found:

  1. [Correctness] distill_hf.py line ~283: dist.cleanup() calls barrier() then destroy_process_group(). Previously the code had an explicit torch.distributed.barrier() + torch.distributed.destroy_process_group(). The new dist.cleanup() wraps the barrier in suppress(Exception), which silently swallows barrier failures. This is fine for the finally block at the bottom, but here it's used mid-script before the rank-0-only export — if the barrier fails silently, rank 0 could start exporting while other ranks are still finishing distillation. Consider whether the barrier should be explicit here (without suppression) to catch issues, or document why suppression is acceptable.

  2. [Correctness] distill_hf.py line ~290: shutil.copy(f"{args.student_hf_path}/config.json", ...) — if config.json doesn't exist at that path (e.g., the student is a custom anymodel checkpoint without a standard HF config), this will raise FileNotFoundError with no user-friendly message. The deleted export_mbridge_to_hf.py had the same issue, but since this is a simplification PR, consider adding a brief existence check or a clarifying error message.

  3. [Correctness] distill_hf.py line ~296 (finally: dist.cleanup()): After the rank-0 export path already calls dist.cleanup() at line ~283, all ranks will call dist.cleanup() again in the finally block. dist.cleanup() checks is_initialized() so the double-call is safe (no-op on second call). This is fine but worth noting.

  4. [Readability] distill_hf.py line ~290: shutil.copy(f"{args.student_hf_path}/config.json", f"{args.hf_export_path}/config.json") — consider using Path objects for consistency, since the deleted code used Path. Minor nit.

  5. [Tests] test_distill_hf.py: The test updates the arg name from hf_model to student_hf_model and uses Qwen3 checkpoints, which is correct. However, the test doesn't appear to verify that the config.json was actually copied to hf_export_path. Given the shutil.copy is new inline logic replacing a helper function, a simple assertion that config.json exists in the export dir would strengthen confidence.

Suggestions:

  • The removal of the broad except Exception wrapper around main() (lines 299-302 in old code) is a good change — it was just re-printing the traceback before re-raising, which Python does by default.
  • The --student_hf_model rename from --hf_model is clearer since this is specifically the student's model ID. Good naming improvement.
  • Line 290: The config.json copy comment says "save config from student_model" — worth noting why this is needed (export_ckpt doesn't preserve heterogeneous block_configs in config.json).

Overall Assessment: Clean simplification that removes ~270 lines of local workaround code in favor of upstream APIs. The core logic is correct. The issues found are minor edge cases, not blockers.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants