Remove custom DistillationProvider and simplify mbridge distillation and hf export#1122
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRemoved local Megatron-Bridge distillation/export helpers and monkey-patching; switched to upstream Megatron-Bridge APIs for distillation and HF export. Renamed multiple Qwen3/Qwen3-VL converter and model descriptor classes; updated pruning configs, AnyModel imports, examples, and tests to match. Miscellaneous small cleanups and lint/dependency edits. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
examples/puzzletron/mbridge_distillation/distill_hf.py (1)
29-29: Make Puzzletron bridge registration explicit.After switching these imports to upstream Megatron-Bridge, the script now relies on the unrelated
export_to_hf_and_copy_configimport at Line 49 to pull inmodelopt.torch.puzzletron.export.mbridge.__init__, which is what registers the Puzzletron adapters. That hidden dependency is easy to break later by moving the export helper behind--hf_export_pathor removing it. Please register thembridgepackage explicitly near provider setup soAutoBridge.from_hf_pretrained()does not depend on an export-only import.♻️ One way to make the dependency explicit
import argparse +import importlib import os import traceback @@ def main(args: argparse.Namespace): + # Register Puzzletron Megatron-Bridge adapters before resolving HF providers. + importlib.import_module("modelopt.torch.puzzletron.export.mbridge") + checkpoint_dir = os.path.join(args.output_dir, "checkpoints")Also applies to: 43-43
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/mbridge_distillation/distill_hf.py` at line 29, The script currently relies on a side-effect import (export_to_hf_and_copy_config) to register the Puzzletron mbridge adapters, which is brittle; explicitly import the mbridge registration module near the provider setup so AutoBridge.from_hf_pretrained() sees the adapters. Add a direct import of modelopt.torch.puzzletron.export.mbridge (or its __init__) alongside convert_to_distillation_provider before calling convert_to_distillation_provider / AutoBridge.from_hf_pretrained to ensure Puzzletron adapters are registered regardless of export helper usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Line 29: The script currently relies on a side-effect import
(export_to_hf_and_copy_config) to register the Puzzletron mbridge adapters,
which is brittle; explicitly import the mbridge registration module near the
provider setup so AutoBridge.from_hf_pretrained() sees the adapters. Add a
direct import of modelopt.torch.puzzletron.export.mbridge (or its __init__)
alongside convert_to_distillation_provider before calling
convert_to_distillation_provider / AutoBridge.from_hf_pretrained to ensure
Puzzletron adapters are registered regardless of export helper usage.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ddbcd8d7-8874-4595-8696-cbb5fb8f2ebe
📒 Files selected for processing (2)
examples/puzzletron/mbridge_distillation/distill_hf.pymodelopt/torch/puzzletron/export/mbridge/distillation_provider.py
💤 Files with no reviewable changes (1)
- modelopt/torch/puzzletron/export/mbridge/distillation_provider.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feature/puzzletron #1122 +/- ##
======================================================
- Coverage 70.17% 70.15% -0.03%
======================================================
Files 230 230
Lines 26057 26057
======================================================
- Hits 18285 18279 -6
- Misses 7772 7778 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/puzzletron/mbridge_distillation/README.md (1)
92-92:⚠️ Potential issue | 🟡 MinorStale documentation:
--hf-model/--hf_modelargument no longer exists.This line still references
--hf-model/--hf_modelwhich was removed from the script. The documentation should be updated to reflect that only--hf_export_pathis needed for export, and the student model architecture is inferred from--student_hf_path.📝 Suggested fix
-- Add `--hf-export-path` (or `--hf_export_path`) to automatically export the final checkpoint to HuggingFace format after distillation. When exporting, you must also provide `--hf-model` / `--hf_model` as the HuggingFace model ID for the export template (e.g., `meta-llama/Llama-3.1-8B-Instruct`). It should match the base architecture of the student model. The exported model can be evaluated for accuracy using the evaluation tools described in the main [README.md](../README.md#evaluation). +- Add `--hf_export_path` to automatically export the final checkpoint to HuggingFace format after distillation. The export uses the student model architecture from `--student_hf_path`. The exported model can be evaluated for accuracy using the evaluation tools described in the main [README.md](../README.md#evaluation).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/mbridge_distillation/README.md` at line 92, Update the README line that mentions export flags to remove the obsolete `--hf-model` / `--hf_model` parameter and clarify that only `--hf-export-path` (or `--hf_export_path`) is required to export the final checkpoint to HuggingFace format; also state that the student model architecture is inferred from the provided `--student_hf_path` rather than requiring a separate HuggingFace model ID, and ensure any example invocation and the sentence about evaluation still point readers to the main README's evaluation section.
🧹 Nitpick comments (1)
examples/puzzletron/mbridge_distillation/distill_hf.py (1)
270-270: Clarify whysubblocks_safetensors/directory is pre-created.The purpose of creating
subblocks_safetensors/manually is unclear. If this is a workaround for anAutoBridge.export_ckptlimitation, consider adding a brief comment explaining why it's needed.💡 Suggested clarification
bridge = AutoBridge.from_hf_pretrained( args.student_hf_path, trust_remote_code=args.trust_remote_code ) + # Create subblocks_safetensors dir required by AutoBridge.export_ckpt os.makedirs(os.path.join(args.hf_export_path, "subblocks_safetensors"), exist_ok=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/mbridge_distillation/distill_hf.py` at line 270, Add a brief explanatory comment immediately above the os.makedirs call that creates "subblocks_safetensors" in distill_hf.py describing that this pre-creation is a deliberate workaround for AutoBridge.export_ckpt (or the downstream export routine) which expects that directory to already exist when it writes subblock safetensor files; mention the function name AutoBridge.export_ckpt and the reason (preventing a missing-directory error during export) so future maintainers know why the directory is created and can remove the workaround if export behavior is fixed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Around line 256-276: Add a pre-check and clear error handling before calling
bridge.export_ckpt: verify the expected checkpoint path (checkpoint_dir +
f"iter_{args.train_iters:07d}") exists when args.hf_export_path is set and only
proceed with dist.cleanup()/bridge.export_ckpt if the path is present; otherwise
raise or log a descriptive error mentioning the missing checkpoint path and the
expected location so users know to check train_iters/save_interval. Wrap the
export call (bridge.export_ckpt) in a try/except to catch exceptions and emit a
helpful message including the checkpoint path and original exception before
re-raising or exiting.
In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py`:
- Line 35: The test currently overwrites the pytest-provided tmp_path with a
hardcoded Path("/tmp/test_distill_hf"); revert to using the fixture-provided
tmp_path variable instead (remove the assignment to tmp_path and use the
existing tmp_path in test_distill_hf.py) or, if durable artifacts are required,
switch to tmp_path_factory to create a named directory; update any references in
the test to use the fixture tmp_path (or tmp_path_factory-created path) so test
isolation and cleanup are preserved.
---
Outside diff comments:
In `@examples/puzzletron/mbridge_distillation/README.md`:
- Line 92: Update the README line that mentions export flags to remove the
obsolete `--hf-model` / `--hf_model` parameter and clarify that only
`--hf-export-path` (or `--hf_export_path`) is required to export the final
checkpoint to HuggingFace format; also state that the student model architecture
is inferred from the provided `--student_hf_path` rather than requiring a
separate HuggingFace model ID, and ensure any example invocation and the
sentence about evaluation still point readers to the main README's evaluation
section.
---
Nitpick comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Line 270: Add a brief explanatory comment immediately above the os.makedirs
call that creates "subblocks_safetensors" in distill_hf.py describing that this
pre-creation is a deliberate workaround for AutoBridge.export_ckpt (or the
downstream export routine) which expects that directory to already exist when it
writes subblock safetensor files; mention the function name
AutoBridge.export_ckpt and the reason (preventing a missing-directory error
during export) so future maintainers know why the directory is created and can
remove the workaround if export behavior is fixed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 50bcee85-fd58-4bac-aed8-960d6c3c234d
📒 Files selected for processing (5)
examples/puzzletron/mbridge_distillation/README.mdexamples/puzzletron/mbridge_distillation/distill_hf.pymodelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.pypyproject.tomltests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
💤 Files with no reviewable changes (2)
- modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py
- pyproject.toml
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
Outdated
Show resolved
Hide resolved
64d4099 to
cd83b43
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
examples/puzzletron/mbridge_distillation/distill_hf.py (1)
256-277:⚠️ Potential issue | 🟡 MinorValidate the final checkpoint before tearing down distributed state.
Line 273 assumes
iter_{args.train_iters:07d}exists. If that directory is missing, the script tears down the process group and then fails deep inside the export path with a much less actionable error. Check the checkpoint directory first on all ranks, then wrap the rank-0 export with a targeted exception.🐛 Proposed fix
# Export to HuggingFace format if hf_export_path is provided if args.hf_export_path: + export_ckpt_dir = os.path.join(checkpoint_dir, f"iter_{args.train_iters:07d}") + if not os.path.isdir(export_ckpt_dir): + raise FileNotFoundError( + "Expected distilled checkpoint at " + f"{export_ckpt_dir} before HF export. " + "Check --train_iters / --eval_interval and confirm training completed." + ) + # Save rank before destroying process group (dist.rank() won't work after destruction) is_rank_0 = dist.rank() == 0 # Destroy process group on all ranks - export_ckpt will create its own temporary one # This prevents cleanup from hanging (cleanup tries to barrier, but rank 0 would be gone) @@ if is_rank_0: bridge = AutoBridge.from_hf_pretrained( args.student_hf_path, trust_remote_code=args.trust_remote_code ) # Create subblocks_safetensors directory else safetensors saving will fail os.makedirs(os.path.join(args.hf_export_path, "subblocks_safetensors"), exist_ok=True) - bridge.export_ckpt( - megatron_path=f"{checkpoint_dir}/iter_{args.train_iters:07d}", - hf_path=args.hf_export_path, - show_progress=True, - strict=True, - ) + try: + bridge.export_ckpt( + megatron_path=export_ckpt_dir, + hf_path=args.hf_export_path, + show_progress=True, + strict=True, + ) + except Exception as exc: + raise RuntimeError( + f"Failed to export HF checkpoint from {export_ckpt_dir} " + f"to {args.hf_export_path}" + ) from exc🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/mbridge_distillation/distill_hf.py` around lines 256 - 277, Before calling dist.cleanup(), verify that the final checkpoint directory (f"{checkpoint_dir}/iter_{args.train_iters:07d}") exists on all ranks (e.g., use os.path.isdir and a collective check) and abort early with a clear log if missing; then after teardown, keep the rank-0 export (AutoBridge.from_hf_pretrained + bridge.export_ckpt) wrapped in a targeted try/except that logs the exact exception and context (checkpoint path, args.train_iters, hf_export_path) and re-raises or exits cleanly so failures during bridge.export_ckpt produce actionable messages rather than failing silently after process group destruction.tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py (1)
35-35:⚠️ Potential issue | 🟠 MajorKeep the pytest-provided
tmp_path.Line 35 replaces pytest's isolated temp dir with a shared
/tmp/test_distill_hf. That makes reruns and parallel jobs share checkpoints/export artifacts, so this test can pass or fail based on stale files instead of the current run.🐛 Proposed fix
- tmp_path = Path("/tmp/test_distill_hf")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py` at line 35, The test overwrites the pytest-provided tmp_path by assigning tmp_path = Path("/tmp/test_distill_hf"), causing shared state between runs; remove that assignment so the pytest tmp_path fixture is used (or, if you need a named subdirectory, replace the assignment with tmp_path = tmp_path / "test_distill_hf" and create it), i.e., locate the tmp_path assignment in tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py and stop overriding the tmp_path fixture.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Around line 256-277: Before calling dist.cleanup(), verify that the final
checkpoint directory (f"{checkpoint_dir}/iter_{args.train_iters:07d}") exists on
all ranks (e.g., use os.path.isdir and a collective check) and abort early with
a clear log if missing; then after teardown, keep the rank-0 export
(AutoBridge.from_hf_pretrained + bridge.export_ckpt) wrapped in a targeted
try/except that logs the exact exception and context (checkpoint path,
args.train_iters, hf_export_path) and re-raises or exits cleanly so failures
during bridge.export_ckpt produce actionable messages rather than failing
silently after process group destruction.
In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py`:
- Line 35: The test overwrites the pytest-provided tmp_path by assigning
tmp_path = Path("/tmp/test_distill_hf"), causing shared state between runs;
remove that assignment so the pytest tmp_path fixture is used (or, if you need a
named subdirectory, replace the assignment with tmp_path = tmp_path /
"test_distill_hf" and create it), i.e., locate the tmp_path assignment in
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py and stop
overriding the tmp_path fixture.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: dd51ba9d-fc48-4011-8d50-aa1cb63cd88d
📒 Files selected for processing (15)
examples/puzzletron/mbridge_distillation/README.mdexamples/puzzletron/mbridge_distillation/distill_hf.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.pymodelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.pymodelopt/torch/puzzletron/mip/run_puzzle.pymodelopt/torch/puzzletron/mip/utils.pymodelopt/torch/puzzletron/sewing_kit/core.pymodelopt/torch/puzzletron/sewing_kit/utils.pymodelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.pymodelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.pypyproject.tomltests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
💤 Files with no reviewable changes (8)
- modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
- modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
- modelopt/torch/puzzletron/mip/utils.py
- modelopt/torch/puzzletron/sewing_kit/core.py
- modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
- modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
- pyproject.toml
- modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py
✅ Files skipped from review due to trivial changes (1)
- modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/puzzletron/mbridge_distillation/README.md
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py (1)
118-145:⚠️ Potential issue | 🟠 MajorThis fixture still doesn’t exercise the heterogeneous distillation path.
Both checkpoints are created from the same
Qwen/Qwen3-0.6Btemplate and both keephybrid_override_pattern=None, so afterconvert_model(...)this test is still validating a homogeneous setup. That leaves the PR’s key claim—removing the customDistillationProvideris still safe for heterogeneous models—untested here. Please make one side genuinely heterogeneous before conversion (for example by varying per-layer FFN or block configs).As per coding guidelines, "Write tests using pytest for all new features and examples; organize tests into
tests/unit(fast CPU-based),tests/gpu(fast GPU-based),tests/gpu_megatron(Megatron-Core),tests/gpu_trtllm(TensorRT-LLM), andtests/examples(integration tests)" and "All test coverage checks in PRs must pass for new features and examples."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py` around lines 118 - 145, The test currently creates two identical checkpoints (both via create_and_save_small_hf_model with hf_model_name="Qwen/Qwen3-0.6B" and hybrid_override_pattern=None) so it never exercises heterogeneous distillation; update the fixture so one side is genuinely different before conversion (e.g., call create_and_save_small_hf_model for teacher_hf_dir with a different hf_model_name or supply a non-None hybrid_override_pattern that alters per-layer FFN/block configs), keep student_hf_dir unchanged, then run convert_model on both student_hf_dir and teacher_hf_dir as before so convert_model and subsequent distillation see heterogeneous inputs (refer to create_and_save_small_hf_model, hybrid_override_pattern, convert_model, student_hf_dir, teacher_hf_dir).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py`:
- Around line 118-145: The test currently creates two identical checkpoints
(both via create_and_save_small_hf_model with hf_model_name="Qwen/Qwen3-0.6B"
and hybrid_override_pattern=None) so it never exercises heterogeneous
distillation; update the fixture so one side is genuinely different before
conversion (e.g., call create_and_save_small_hf_model for teacher_hf_dir with a
different hf_model_name or supply a non-None hybrid_override_pattern that alters
per-layer FFN/block configs), keep student_hf_dir unchanged, then run
convert_model on both student_hf_dir and teacher_hf_dir as before so
convert_model and subsequent distillation see heterogeneous inputs (refer to
create_and_save_small_hf_model, hybrid_override_pattern, convert_model,
student_hf_dir, teacher_hf_dir).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 91a932d2-8d5f-4071-9418-bf2c9352d03d
📒 Files selected for processing (12)
modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.pymodelopt/torch/puzzletron/mip/run_puzzle.pymodelopt/torch/puzzletron/mip/utils.pymodelopt/torch/puzzletron/sewing_kit/core.pymodelopt/torch/puzzletron/sewing_kit/utils.pymodelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.pymodelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.pypyproject.tomltests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
💤 Files with no reviewable changes (7)
- modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
- modelopt/torch/puzzletron/mip/utils.py
- modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
- modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
- modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
- modelopt/torch/puzzletron/sewing_kit/core.py
- pyproject.toml
✅ Files skipped from review due to trivial changes (2)
- modelopt/torch/puzzletron/mip/run_puzzle.py
- modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
cd83b43 to
3c32b83
Compare
…and hf export Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…tillation-provider Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
3a10029 to
24a8066
Compare
…tillation-provider
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
cjluo-nv
left a comment
There was a problem hiding this comment.
Summary: Removes the local DistillationProvider monkey-patch and export_mbridge_to_hf helper, replacing them with upstream Megatron-Bridge APIs (convert_to_distillation_provider from megatron.bridge, AutoBridge.export_ckpt). Also simplifies CLI args and error handling.
Issues Found:
-
[Correctness]
distill_hf.pyline ~283:dist.cleanup()callsbarrier()thendestroy_process_group(). Previously the code had an explicittorch.distributed.barrier()+torch.distributed.destroy_process_group(). The newdist.cleanup()wraps the barrier insuppress(Exception), which silently swallows barrier failures. This is fine for thefinallyblock at the bottom, but here it's used mid-script before the rank-0-only export — if the barrier fails silently, rank 0 could start exporting while other ranks are still finishing distillation. Consider whether the barrier should be explicit here (without suppression) to catch issues, or document why suppression is acceptable. -
[Correctness]
distill_hf.pyline ~290:shutil.copy(f"{args.student_hf_path}/config.json", ...)— ifconfig.jsondoesn't exist at that path (e.g., the student is a custom anymodel checkpoint without a standard HF config), this will raiseFileNotFoundErrorwith no user-friendly message. The deletedexport_mbridge_to_hf.pyhad the same issue, but since this is a simplification PR, consider adding a brief existence check or a clarifying error message. -
[Correctness]
distill_hf.pyline ~296 (finally: dist.cleanup()): After the rank-0 export path already callsdist.cleanup()at line ~283, all ranks will calldist.cleanup()again in thefinallyblock.dist.cleanup()checksis_initialized()so the double-call is safe (no-op on second call). This is fine but worth noting. -
[Readability]
distill_hf.pyline ~290:shutil.copy(f"{args.student_hf_path}/config.json", f"{args.hf_export_path}/config.json")— consider usingPathobjects for consistency, since the deleted code usedPath. Minor nit. -
[Tests]
test_distill_hf.py: The test updates the arg name fromhf_modeltostudent_hf_modeland uses Qwen3 checkpoints, which is correct. However, the test doesn't appear to verify that theconfig.jsonwas actually copied tohf_export_path. Given theshutil.copyis new inline logic replacing a helper function, a simple assertion thatconfig.jsonexists in the export dir would strengthen confidence.
Suggestions:
- The removal of the broad
except Exceptionwrapper aroundmain()(lines 299-302 in old code) is a good change — it was just re-printing the traceback before re-raising, which Python does by default. - The
--student_hf_modelrename from--hf_modelis clearer since this is specifically the student's model ID. Good naming improvement. - Line 290: The
config.jsoncopy comment says "save config from student_model" — worth noting why this is needed (export_ckpt doesn't preserve heterogeneous block_configs in config.json).
Overall Assessment: Clean simplification that removes ~270 lines of local workaround code in favor of upstream APIs. The core logic is correct. The issues found are minor edge cases, not blockers.
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Summary by CodeRabbit