Skip to content

docs: Add docs about diffusion support in AM#1495

Merged
akoumpa merged 26 commits intomainfrom
pranav/diffusion_docs
Mar 20, 2026
Merged

docs: Add docs about diffusion support in AM#1495
akoumpa merged 26 commits intomainfrom
pranav/diffusion_docs

Conversation

@pthombre
Copy link
Copy Markdown
Contributor

@pthombre pthombre commented Mar 9, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 9, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@akoumpa akoumpa changed the title feat: Add docs about diffusion support in AM docs: Add docs about diffusion support in AM Mar 9, 2026
@akoumpa akoumpa added the docs-only With great power comes great responsibility. label Mar 9, 2026
@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Mar 9, 2026

Hi @chenopis can you help us with a review?

@pthombre pthombre requested a review from linnanwang March 12, 2026 07:00
Copy link
Copy Markdown
Contributor

@chenopis chenopis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — 8 non-critical suggestions

Overall this is well-structured documentation with accurate technical content. The YAML walkthroughs, CLI argument tables, and model-specific notes are thorough and verified against source code.

Two items need attention before merge: a dead file reference (generate_wan_distributed.yaml) and an undocumented feature (LATEST checkpoint keyword). The remaining findings are style and completeness improvements.

No critical issues blocking merge — posted suggestions are for optional improvements.

Review generated with AI assistance.

| `generate_hunyuan.yaml` | HunyuanVideo | Video | 1 |

:::{note}
You can use `--model.checkpoint ./checkpoints/LATEST` to automatically load the most recent checkpoint.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unverifiable feature (critical)

generate.py has no code to handle a LATEST keyword — searching the file returns zero matches. This documents a feature that doesn't exist in the codebase.

Suggested change
You can use `--model.checkpoint ./checkpoints/LATEST` to automatically load the most recent checkpoint.

Remove this note, or implement LATEST checkpoint resolution in generate.py before documenting it.

| Model | HF Model ID | Task | Parameters | Parallelization | Example YAMLs |
|-------|-------------|------|------------|-----------------|---------------|
| Wan 2.1 T2V 1.3B | `Wan-AI/Wan2.1-T2V-1.3B-Diffusers` | Text-to-Video | 1.3B | FSDP2 | [finetune](../../examples/diffusion/finetune/wan2_1_t2v_flow.yaml), [pretrain](../../examples/diffusion/pretrain/wan2_1_t2v_flow.yaml) |
| Wan 2.1 T2V 14B | — | Text-to-Video | 14B | FSDP2 | [finetune (multinode)](../../examples/diffusion/finetune/wan2_1_t2v_flow_multinode.yaml) |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete model data (high)

Two cells in this table have missing data:

  1. Wan 2.1 T2V 14B: The HF Model ID is listed as , but the model exists as Wan-AI/Wan2.1-T2V-14B-Diffusers (referenced in the deleted tools/diffusion/data/decode.py).
  2. HunyuanVideo 1.5: The parameter count is . If the count is known, it should be included for completeness.
Suggested change
| Wan 2.1 T2V 14B | | Text-to-Video | 14B | FSDP2 | [finetune (multinode)](../../examples/diffusion/finetune/wan2_1_t2v_flow_multinode.yaml) |
| Wan 2.1 T2V 14B | `Wan-AI/Wan2.1-T2V-14B-Diffusers` | Text-to-Video | 14B | FSDP2 | [finetune (multinode)](../../examples/diffusion/finetune/wan2_1_t2v_flow_multinode.yaml) |


Diffusion models are a class of generative models that learn to produce images or videos by iteratively denoising samples from a noise distribution. NeMo AutoModel supports training diffusion models using **flow matching**, a framework that regresses velocity fields along straight interpolation paths between noise and data.

NeMo AutoModel integrates with [Hugging Face Diffusers](https://huggingface.co/docs/diffusers) for model loading and generation, while providing its own distributed training infrastructure via the `TrainDiffusionRecipe`. This recipe handles FSDP2 parallelization, flow matching loss computation, multiresolution bucketed dataloading, and checkpoint management.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Latinism "via" (medium)

Suggested change
NeMo AutoModel integrates with [Hugging Face Diffusers](https://huggingface.co/docs/diffusers) for model loading and generation, while providing its own distributed training infrastructure via the `TrainDiffusionRecipe`. This recipe handles FSDP2 parallelization, flow matching loss computation, multiresolution bucketed dataloading, and checkpoint management.
NeMo AutoModel integrates with [Hugging Face Diffusers](https://huggingface.co/docs/diffusers) for model loading and generation, while providing its own distributed training infrastructure through the `TrainDiffusionRecipe`. This recipe handles FSDP2 parallelization, flow matching loss computation, multiresolution bucketed dataloading, and checkpoint management.

| `weight_decay` | 0.01 | 0.1 |
| `flow_shift` | 3.0 | 2.5 |
| `logit_std` | 1.0 | 1.5 |
| Dataset size | 100s--1000s of samples | 10K+ samples |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing hyperparameter difference (low)

The pretrain config also uses betas: [0.9, 0.95] while the finetune config uses betas: [0.9, 0.999]. Consider adding a betas row to the comparison table for completeness:

Setting Fine-Tuning Pretraining
betas [0.9, 0.999] [0.9, 0.95]

Base automatically changed from pranav/inference_utility_diffusion to main March 18, 2026 23:55
@pthombre pthombre marked this pull request as ready for review March 19, 2026 23:04
@pthombre pthombre force-pushed the pranav/diffusion_docs branch 2 times, most recently from 09b1139 to cd826f7 Compare March 19, 2026 23:15
@pthombre
Copy link
Copy Markdown
Contributor Author

/ok to test cd826f7

akoumpa and others added 14 commits March 19, 2026 18:29
Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
This PR is docs-only; restore test and tool files to main's state.

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Fix Nemotron v3 inputs_embeds generation

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
* Fix checkpointing for PEFT.

Previously, the state_dict in the modelstate class had an if/elseif/else statement where
peft was handled in two caces and non-peft on the third one.

The first case of peft, was handling correctly, while the second was including
buffers causing issues in downstream consumers.

This fix simplifies the logic (simple if/else) and bypassed the issues with the
buffer.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update stateful_wrappers.py

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update test

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* improve error logging in test; pass is_peft to optimizerstate

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix & logging

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add filter

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fmt

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* u

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* convert may return tuple or dict :S

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
* Move source install fla to dev group

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>

* Update uv lock

Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

---------

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
@pthombre pthombre force-pushed the pranav/diffusion_docs branch from 6432ec0 to 5d19e38 Compare March 20, 2026 01:29
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
@pthombre
Copy link
Copy Markdown
Contributor Author

/ok to test eaffe7f

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
@pthombre
Copy link
Copy Markdown
Contributor Author

/ok to test 90e8199

Copy link
Copy Markdown
Contributor

@akoumpa akoumpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@akoumpa akoumpa merged commit 4becc00 into main Mar 20, 2026
32 checks passed
@akoumpa akoumpa deleted the pranav/diffusion_docs branch March 20, 2026 19:59
torsli pushed a commit that referenced this pull request Mar 24, 2026
* feat: Integrate Wan with multi-resolution DL

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Required changes for compatability with AM container

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Fix overrides

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* feat: Add docs about diffusion support in AM

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Remove older data processing tools

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/dataset.md

Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Apply suggestions from code review

Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Revert non-doc code changes to match main

This PR is docs-only; restore test and tool files to main's state.

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Restructure the diffusion finetuning doc

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* fix: Nemotron v3 inputs_embeds generation (#1583)

Fix Nemotron v3 inputs_embeds generation

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* fix: checkpointing for PEFT. (#1576)

* Fix checkpointing for PEFT.

Previously, the state_dict in the modelstate class had an if/elseif/else statement where
peft was handled in two caces and non-peft on the third one.

The first case of peft, was handling correctly, while the second was including
buffers causing issues in downstream consumers.

This fix simplifies the logic (simple if/else) and bypassed the issues with the
buffer.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update stateful_wrappers.py

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update test

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* improve error logging in test; pass is_peft to optimizerstate

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix & logging

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add filter

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fmt

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* u

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* convert may return tuple or dict :S

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/dataset-overview.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/overview.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Add cr changes

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* ci: Move source install fla to dev group (#1580)

* Move source install fla to dev group

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>

* Update uv lock

Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

---------

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update model coverage

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update overview doc

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update Hunyuan number of params

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Fix docs build issue

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

---------

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com>
Co-authored-by: NeMo Bot <nemo-bot@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-only With great power comes great responsibility.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants