docs: Add docs about diffusion support in AM#1495
Conversation
|
Hi @chenopis can you help us with a review? |
chenopis
left a comment
There was a problem hiding this comment.
LGTM — 8 non-critical suggestions
Overall this is well-structured documentation with accurate technical content. The YAML walkthroughs, CLI argument tables, and model-specific notes are thorough and verified against source code.
Two items need attention before merge: a dead file reference (generate_wan_distributed.yaml) and an undocumented feature (LATEST checkpoint keyword). The remaining findings are style and completeness improvements.
No critical issues blocking merge — posted suggestions are for optional improvements.
Review generated with AI assistance.
| | `generate_hunyuan.yaml` | HunyuanVideo | Video | 1 | | ||
|
|
||
| :::{note} | ||
| You can use `--model.checkpoint ./checkpoints/LATEST` to automatically load the most recent checkpoint. |
There was a problem hiding this comment.
Unverifiable feature (critical)
generate.py has no code to handle a LATEST keyword — searching the file returns zero matches. This documents a feature that doesn't exist in the codebase.
| You can use `--model.checkpoint ./checkpoints/LATEST` to automatically load the most recent checkpoint. |
Remove this note, or implement LATEST checkpoint resolution in generate.py before documenting it.
docs/model-coverage/diffusion.md
Outdated
| | Model | HF Model ID | Task | Parameters | Parallelization | Example YAMLs | | ||
| |-------|-------------|------|------------|-----------------|---------------| | ||
| | Wan 2.1 T2V 1.3B | `Wan-AI/Wan2.1-T2V-1.3B-Diffusers` | Text-to-Video | 1.3B | FSDP2 | [finetune](../../examples/diffusion/finetune/wan2_1_t2v_flow.yaml), [pretrain](../../examples/diffusion/pretrain/wan2_1_t2v_flow.yaml) | | ||
| | Wan 2.1 T2V 14B | — | Text-to-Video | 14B | FSDP2 | [finetune (multinode)](../../examples/diffusion/finetune/wan2_1_t2v_flow_multinode.yaml) | |
There was a problem hiding this comment.
Incomplete model data (high)
Two cells in this table have missing data:
- Wan 2.1 T2V 14B: The HF Model ID is listed as
—, but the model exists asWan-AI/Wan2.1-T2V-14B-Diffusers(referenced in the deletedtools/diffusion/data/decode.py). - HunyuanVideo 1.5: The parameter count is
—. If the count is known, it should be included for completeness.
| | Wan 2.1 T2V 14B | — | Text-to-Video | 14B | FSDP2 | [finetune (multinode)](../../examples/diffusion/finetune/wan2_1_t2v_flow_multinode.yaml) | | |
| | Wan 2.1 T2V 14B | `Wan-AI/Wan2.1-T2V-14B-Diffusers` | Text-to-Video | 14B | FSDP2 | [finetune (multinode)](../../examples/diffusion/finetune/wan2_1_t2v_flow_multinode.yaml) | |
|
|
||
| Diffusion models are a class of generative models that learn to produce images or videos by iteratively denoising samples from a noise distribution. NeMo AutoModel supports training diffusion models using **flow matching**, a framework that regresses velocity fields along straight interpolation paths between noise and data. | ||
|
|
||
| NeMo AutoModel integrates with [Hugging Face Diffusers](https://huggingface.co/docs/diffusers) for model loading and generation, while providing its own distributed training infrastructure via the `TrainDiffusionRecipe`. This recipe handles FSDP2 parallelization, flow matching loss computation, multiresolution bucketed dataloading, and checkpoint management. |
There was a problem hiding this comment.
Style: Latinism "via" (medium)
| NeMo AutoModel integrates with [Hugging Face Diffusers](https://huggingface.co/docs/diffusers) for model loading and generation, while providing its own distributed training infrastructure via the `TrainDiffusionRecipe`. This recipe handles FSDP2 parallelization, flow matching loss computation, multiresolution bucketed dataloading, and checkpoint management. | |
| NeMo AutoModel integrates with [Hugging Face Diffusers](https://huggingface.co/docs/diffusers) for model loading and generation, while providing its own distributed training infrastructure through the `TrainDiffusionRecipe`. This recipe handles FSDP2 parallelization, flow matching loss computation, multiresolution bucketed dataloading, and checkpoint management. |
docs/guides/diffusion/finetune.md
Outdated
| | `weight_decay` | 0.01 | 0.1 | | ||
| | `flow_shift` | 3.0 | 2.5 | | ||
| | `logit_std` | 1.0 | 1.5 | | ||
| | Dataset size | 100s--1000s of samples | 10K+ samples | |
There was a problem hiding this comment.
Missing hyperparameter difference (low)
The pretrain config also uses betas: [0.9, 0.95] while the finetune config uses betas: [0.9, 0.999]. Consider adding a betas row to the comparison table for completeness:
| Setting | Fine-Tuning | Pretraining |
|---|---|---|
betas |
[0.9, 0.999] |
[0.9, 0.95] |
09b1139 to
cd826f7
Compare
|
/ok to test cd826f7 |
Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
This PR is docs-only; restore test and tool files to main's state. Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Fix Nemotron v3 inputs_embeds generation Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
* Fix checkpointing for PEFT. Previously, the state_dict in the modelstate class had an if/elseif/else statement where peft was handled in two caces and non-peft on the third one. The first case of peft, was handling correctly, while the second was including buffers causing issues in downstream consumers. This fix simplifies the logic (simple if/else) and bypassed the issues with the buffer. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Update stateful_wrappers.py * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * improve error logging in test; pass is_peft to optimizerstate Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix & logging Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add filter Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fmt Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * u Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * convert may return tuple or dict :S Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
* Move source install fla to dev group Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update uv lock Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> --------- Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: NeMo Bot <nemo-bot@nvidia.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
6432ec0 to
5d19e38
Compare
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
|
/ok to test eaffe7f |
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
|
/ok to test 90e8199 |
* feat: Integrate Wan with multi-resolution DL Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Required changes for compatability with AM container Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Fix overrides Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * feat: Add docs about diffusion support in AM Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Remove older data processing tools Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/dataset.md Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Apply suggestions from code review Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Revert non-doc code changes to match main This PR is docs-only; restore test and tool files to main's state. Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Restructure the diffusion finetuning doc Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * fix: Nemotron v3 inputs_embeds generation (#1583) Fix Nemotron v3 inputs_embeds generation Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * fix: checkpointing for PEFT. (#1576) * Fix checkpointing for PEFT. Previously, the state_dict in the modelstate class had an if/elseif/else statement where peft was handled in two caces and non-peft on the third one. The first case of peft, was handling correctly, while the second was including buffers causing issues in downstream consumers. This fix simplifies the logic (simple if/else) and bypassed the issues with the buffer. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Update stateful_wrappers.py * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * improve error logging in test; pass is_peft to optimizerstate Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix & logging Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add filter Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fmt Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * u Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * convert may return tuple or dict :S Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/dataset-overview.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/overview.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Add cr changes Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * ci: Move source install fla to dev group (#1580) * Move source install fla to dev group Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update uv lock Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> --------- Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: NeMo Bot <nemo-bot@nvidia.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update model coverage Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update overview doc Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update Hunyuan number of params Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Fix docs build issue Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> --------- Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com> Co-authored-by: NeMo Bot <nemo-bot@nvidia.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Changelog
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information