docs: Add docs about diffusion support in AM by pthombre · Pull Request #1495 · NVIDIA-NeMo/Automodel

pthombre · 2026-03-09T05:43:40Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

Add specific line by line info of high level changes in this PR.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

copy-pr-bot · 2026-03-09T05:43:43Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

akoumpa · 2026-03-09T16:37:36Z

Hi @chenopis can you help us with a review?

chenopis

LGTM — 8 non-critical suggestions

Overall this is well-structured documentation with accurate technical content. The YAML walkthroughs, CLI argument tables, and model-specific notes are thorough and verified against source code.

Two items need attention before merge: a dead file reference (generate_wan_distributed.yaml) and an undocumented feature (LATEST checkpoint keyword). The remaining findings are style and completeness improvements.

No critical issues blocking merge — posted suggestions are for optional improvements.

Review generated with AI assistance.

docs/guides/diffusion/finetune.md

chenopis · 2026-03-12T22:10:04Z

docs/guides/diffusion/finetune.md

+| `generate_hunyuan.yaml` | HunyuanVideo | Video | 1 |
+
+:::{note}
+You can use `--model.checkpoint ./checkpoints/LATEST` to automatically load the most recent checkpoint.


Unverifiable feature (critical)

generate.py has no code to handle a LATEST keyword — searching the file returns zero matches. This documents a feature that doesn't exist in the codebase.

Suggested change

You can use `--model.checkpoint ./checkpoints/LATEST` to automatically load the most recent checkpoint.

Remove this note, or implement LATEST checkpoint resolution in generate.py before documenting it.

chenopis · 2026-03-12T22:10:04Z

docs/model-coverage/diffusion.md

+| Model | HF Model ID | Task | Parameters | Parallelization | Example YAMLs |
+|-------|-------------|------|------------|-----------------|---------------|
+| Wan 2.1 T2V 1.3B | `Wan-AI/Wan2.1-T2V-1.3B-Diffusers` | Text-to-Video | 1.3B | FSDP2 | [finetune](../../examples/diffusion/finetune/wan2_1_t2v_flow.yaml), [pretrain](../../examples/diffusion/pretrain/wan2_1_t2v_flow.yaml) |
+| Wan 2.1 T2V 14B | — | Text-to-Video | 14B | FSDP2 | [finetune (multinode)](../../examples/diffusion/finetune/wan2_1_t2v_flow_multinode.yaml) |


Incomplete model data (high)

Two cells in this table have missing data:

Wan 2.1 T2V 14B: The HF Model ID is listed as —, but the model exists as Wan-AI/Wan2.1-T2V-14B-Diffusers (referenced in the deleted tools/diffusion/data/decode.py).

HunyuanVideo 1.5: The parameter count is —. If the count is known, it should be included for completeness.

Suggested change

| Wan 2.1 T2V 14B | — | Text-to-Video | 14B | FSDP2 | [finetune (multinode)](../../examples/diffusion/finetune/wan2_1_t2v_flow_multinode.yaml) |

| Wan 2.1 T2V 14B | `Wan-AI/Wan2.1-T2V-14B-Diffusers` | Text-to-Video | 14B | FSDP2 | [finetune (multinode)](../../examples/diffusion/finetune/wan2_1_t2v_flow_multinode.yaml) |

docs/guides/diffusion/finetune.md

chenopis · 2026-03-12T22:10:04Z

docs/model-coverage/diffusion.md

+
+Diffusion models are a class of generative models that learn to produce images or videos by iteratively denoising samples from a noise distribution. NeMo AutoModel supports training diffusion models using **flow matching**, a framework that regresses velocity fields along straight interpolation paths between noise and data.
+
+NeMo AutoModel integrates with [Hugging Face Diffusers](https://huggingface.co/docs/diffusers) for model loading and generation, while providing its own distributed training infrastructure via the `TrainDiffusionRecipe`. This recipe handles FSDP2 parallelization, flow matching loss computation, multiresolution bucketed dataloading, and checkpoint management.


Style: Latinism "via" (medium)

Suggested change

NeMo AutoModel integrates with [Hugging Face Diffusers](https://huggingface.co/docs/diffusers) for model loading and generation, while providing its own distributed training infrastructure via the `TrainDiffusionRecipe`. This recipe handles FSDP2 parallelization, flow matching loss computation, multiresolution bucketed dataloading, and checkpoint management.

NeMo AutoModel integrates with [Hugging Face Diffusers](https://huggingface.co/docs/diffusers) for model loading and generation, while providing its own distributed training infrastructure through the `TrainDiffusionRecipe`. This recipe handles FSDP2 parallelization, flow matching loss computation, multiresolution bucketed dataloading, and checkpoint management.

docs/guides/diffusion/dataset.md

chenopis · 2026-03-12T22:10:04Z

docs/guides/diffusion/finetune.md

+| `weight_decay` | 0.01 | 0.1 |
+| `flow_shift` | 3.0 | 2.5 |
+| `logit_std` | 1.0 | 1.5 |
+| Dataset size | 100s--1000s of samples | 10K+ samples |


Missing hyperparameter difference (low)

The pretrain config also uses betas: [0.9, 0.95] while the finetune config uses betas: [0.9, 0.999]. Consider adding a betas row to the comparison table for completeness:

Setting Fine-Tuning Pretraining

betas [0.9, 0.999] [0.9, 0.95]

pthombre · 2026-03-19T23:16:13Z

/ok to test cd826f7

docs/guides/diffusion/finetune.md

Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

This PR is docs-only; restore test and tool files to main's state. Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

Fix Nemotron v3 inputs_embeds generation Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Fix checkpointing for PEFT. Previously, the state_dict in the modelstate class had an if/elseif/else statement where peft was handled in two caces and non-peft on the third one. The first case of peft, was handling correctly, while the second was including buffers causing issues in downstream consumers. This fix simplifies the logic (simple if/else) and bypassed the issues with the buffer. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Update stateful_wrappers.py * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * improve error logging in test; pass is_peft to optimizerstate Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix & logging Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add filter Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fmt Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * u Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * convert may return tuple or dict :S Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Move source install fla to dev group Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update uv lock Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> --------- Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: NeMo Bot <nemo-bot@nvidia.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

pthombre · 2026-03-20T01:56:17Z

/ok to test eaffe7f

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

pthombre · 2026-03-20T02:08:26Z

/ok to test 90e8199

akoumpa

lgtm

* feat: Integrate Wan with multi-resolution DL Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Required changes for compatability with AM container Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Fix overrides Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * feat: Add docs about diffusion support in AM Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Remove older data processing tools Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/dataset.md Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Apply suggestions from code review Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Revert non-doc code changes to match main This PR is docs-only; restore test and tool files to main's state. Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Restructure the diffusion finetuning doc Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * fix: Nemotron v3 inputs_embeds generation (#1583) Fix Nemotron v3 inputs_embeds generation Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * fix: checkpointing for PEFT. (#1576) * Fix checkpointing for PEFT. Previously, the state_dict in the modelstate class had an if/elseif/else statement where peft was handled in two caces and non-peft on the third one. The first case of peft, was handling correctly, while the second was including buffers causing issues in downstream consumers. This fix simplifies the logic (simple if/else) and bypassed the issues with the buffer. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Update stateful_wrappers.py * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * improve error logging in test; pass is_peft to optimizerstate Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix & logging Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add filter Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fmt Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * u Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * convert may return tuple or dict :S Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/dataset-overview.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/overview.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Add cr changes Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * ci: Move source install fla to dev group (#1580) * Move source install fla to dev group Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update uv lock Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> --------- Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: NeMo Bot <nemo-bot@nvidia.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update model coverage Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update overview doc Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update Hunyuan number of params Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Fix docs build issue Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> --------- Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com> Co-authored-by: NeMo Bot <nemo-bot@nvidia.com>

akoumpa changed the title ~~feat: Add docs about diffusion support in AM~~ docs: Add docs about diffusion support in AM Mar 9, 2026

akoumpa added the docs-only With great power comes great responsibility. label Mar 9, 2026

pthombre requested a review from linnanwang March 12, 2026 07:00

chenopis approved these changes Mar 12, 2026

View reviewed changes

Base automatically changed from pranav/inference_utility_diffusion to main March 18, 2026 23:55

pthombre marked this pull request as ready for review March 19, 2026 23:04

pthombre requested review from a team, HuiyingLi, ZhiyuLi-Nvidia, adil-a, akoumpa, hemildesai and jgerh as code owners March 19, 2026 23:04

pthombre force-pushed the pranav/diffusion_docs branch 2 times, most recently from 09b1139 to cd826f7 Compare March 19, 2026 23:15

copy-pr-bot bot temporarily deployed to nemo-ci March 19, 2026 23:16 Inactive

akoumpa reviewed Mar 19, 2026

View reviewed changes

docs/guides/diffusion/finetune.md Show resolved Hide resolved

akoumpa reviewed Mar 19, 2026

View reviewed changes

docs/guides/diffusion/finetune.md Outdated Show resolved Hide resolved

akoumpa reviewed Mar 19, 2026

View reviewed changes

docs/guides/diffusion/finetune.md Outdated Show resolved Hide resolved