Further Reduce LTX VAE decode peak RAM usage#13052
Further Reduce LTX VAE decode peak RAM usage#13052comfyanonymous merged 1 commit intoComfy-Org:masterfrom
Conversation
📝 WalkthroughWalkthroughThe changes introduce buffer-based decoding optimization to the video VAE pipeline. The Decoder class now supports preallocation of output buffers through a new 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can enforce grammar and style rules using `languagetool`.Configure the |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy/sd.py`:
- Around line 956-964: The code currently assumes that having
first_stage_model.decode_output_shape implies first_stage_model.decode accepts
an output_buffer kwarg, which can raise a TypeError; update the logic around
preallocated/pixel_samples to verify the decode() signature (e.g., via
inspect.signature or a safe trial call) before setting preallocated True and
passing output_buffer to first_stage_model.decode, and if decode() does not
accept output_buffer then fall back to the safe copy path (call decode without
output_buffer and copy into pixel_samples) so that
first_stage_model.decode_output_shape, first_stage_model.decode, pixel_samples,
preallocated and vae_options are handled compatibly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1f0bf258-0c0d-4fdf-aa54-c99f3a39d8c9
📒 Files selected for processing (2)
comfy/ldm/lightricks/vae/causal_video_autoencoder.pycomfy/sd.py
Further reduces LTX2 VAE peak RAM to output level.
Make VAE decoder write decoded chunks directly into a pre-allocated output buffer, eliminating intermediate allocations and the full-output torch.cat
unpatchifyruns per-chunk on GPU instead of on the full output on CPUWhen the VAE supports
decode_output_shape, the caller passes its output buffer directly to the decoder, eliminating the intermediate bf16 buffer entirely