Fix VRAM leak in tiler fallback in video VAEs#13073
Fix VRAM leak in tiler fallback in video VAEs#13073comfyanonymous merged 3 commits intoComfy-Org:masterfrom
Conversation
This doesnt cost a lot and creates the expected VRAM reduction in resource monitors when you fallback to tiler.
Moved Decoder3d’s recursive run_up out of forward into a class method to avoid nested closure self-reference cycles. This avoids cyclic garbage that delays garbage of tensors which in turn delays VRAM release before tiled fallback.
Mov the recursive run_up out of forward into a class method to avoid nested closure self-reference cycles. This avoids cyclic garbage that delays garbage of tensors which in turn delays VRAM release before tiled fallback.
📝 WalkthroughWalkthroughThe PR refactors two VAE decoder implementations to extract nested 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
comfy/ldm/lightricks/vae/causal_video_autoencoder.py (1)
604-641:⚠️ Potential issue | 🔴 CriticalInitialize
scaled_timestepfor non-conditioned decode paths (runtime blocker).
scaled_timestepis only set insideif self.timestep_conditioning:but is always passed torun_up()at line 641. Whenself.timestep_conditioningisFalse, this raisesUnboundLocalError.Proposed fix
timestep_shift_scale = None + scaled_timestep = None if self.timestep_conditioning: assert ( timestep is not None ), "should pass timestep with timestep_conditioning=True" scaled_timestep = timestep * self.timestep_scale_multiplier.to(dtype=sample.dtype, device=sample.device)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy/ldm/lightricks/vae/causal_video_autoencoder.py` around lines 604 - 641, The variable scaled_timestep is only assigned inside the if self.timestep_conditioning branch but is always passed to run_up(...), causing an UnboundLocalError when timestep_conditioning is False; initialize scaled_timestep (e.g., set scaled_timestep = None) before the conditional or ensure you pass a defined fallback to run_up, updating the block around scaled_timestep and the call to self.run_up(...) so run_up receives a valid value regardless of self.timestep_conditioning.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@comfy/ldm/lightricks/vae/causal_video_autoencoder.py`:
- Around line 604-641: The variable scaled_timestep is only assigned inside the
if self.timestep_conditioning branch but is always passed to run_up(...),
causing an UnboundLocalError when timestep_conditioning is False; initialize
scaled_timestep (e.g., set scaled_timestep = None) before the conditional or
ensure you pass a defined fallback to run_up, updating the block around
scaled_timestep and the call to self.run_up(...) so run_up receives a valid
value regardless of self.timestep_conditioning.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 12f3042d-2d1b-44ae-9dc6-ec82d0567ac7
📒 Files selected for processing (3)
comfy/ldm/lightricks/vae/causal_video_autoencoder.pycomfy/ldm/wan/vae.pycomfy/sd.py
#13023 (specifically the secondary report)
Primary commit message
wan: vae: Don't recursion in local fns (move run_up)
Moved Decoder3d’s recursive run_up out of forward into a class
method to avoid nested closure self-reference cycles. This avoids
cyclic garbage that delays garbage of tensors which in turn delays
VRAM release before tiled fallback.
Also soft empty the cache so nvtop and friends look better.
Example test conditions:
Linux, 5090
WAN VAE encode -> decode 2048x2048x21f
--disable-cuda-malloc
This hack to get single tile memory trace:
Before:
After:
I can't reproduce the issue on LTX, but apply the same change anyway.
Regression Tests:
LTX2.3 ✅
WAN2.2 I2V ✅