Add configurable model cache timeout for automatic memory management#8693
Add configurable model cache timeout for automatic memory management#8693Pfannkuchensack merged 40 commits intomainfrom
Conversation
- Added model_cache_keep_alive config field (minutes, default 0 = infinite) - Implemented timeout tracking in ModelCache class - Added _record_activity() to track model usage - Added _on_timeout() to auto-clear cache when timeout expires - Added shutdown() method to clean up timers - Integrated timeout with get(), lock(), unlock(), and put() operations - Updated ModelManagerService to pass keep_alive parameter - Added cleanup in stop() method Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
- Created test_model_cache_timeout.py with comprehensive tests - Tests timeout clearing behavior - Tests activity resetting timeout - Tests no-timeout default behavior - Tests shutdown canceling timers Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
- Added clarifying comment that _record_activity is called with lock held - Enhanced double-check in _on_timeout for thread safety - Added lock protection to shutdown method - Improved handling of edge cases where timer fires during activity Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
- Remove unused variable in test - Add clarifying comment for daemon thread setting - Add detailed comment explaining cache clearing with 1000 GB value - Improve code documentation Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
- Add explicit storage_device parameter (cpu) - Add explicit log_memory_usage parameter from config - Improves code clarity and configuration transparency Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
|
@copilot Issues detected:
|
- Remove all trailing whitespace (W293 errors) - Add debug logging when timeout fires but activity detected - Add debug logging when timeout fires but cache is empty - Only log "Clearing model cache" message when actually clearing - Prevents misleading timeout messages during active generation Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
All three issues have been addressed in commit 8d76b4e:
|
|
@copilot I'm still seeing the cache clearing message in the middle of denoising steps: In addition, the There are many similar error messages. |
Only log "Clearing model cache" message when there are actually unlocked models to clear. This prevents the misleading message from appearing during active generation when all models are locked. Changes: - Check for unlocked models before logging clear message - Add count of unlocked models in log message - Add debug log when all models are locked - Improves user experience by avoiding confusing messages Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
What this PR is doing is to move unlocked models from VRAM back into RAM when there has been no generation activity over the past N minutes. It does not remove models from RAM. I basically conceived it as a "hibernation mode" for Invoke. If I'm not actively using it, I want the VRAM to be free for other GPU-consuming processes. The check only runs when the model manager has acquired a thread lock, so I think it is safe from race conditions, but I'm going through all the comments to make double-sure. |
…model-option' into copilot/add-unload-model-option
lstein
left a comment
There was a problem hiding this comment.
Tested under various conditions of load and couldn't break it.
…e previous default behavior
Summary
Adds
model_cache_keep_alive_minconfig option (minutes, default 5) to automatically clear model cache after inactivity. Addresses memory contention when running InvokeAI alongside other GPU applications like Ollama.Implementation:
model_cache_keep_alive_minfield inInvokeAIAppConfigwith 5-minute defaultUsage:
Key Behavior:
Related Issues / Discussions
Addresses enhancement request for automatic model unloading from memory after inactivity period.
QA Instructions
Test default behavior (5-minute timeout):
Test custom timeout:
model_cache_keep_alive_min: 0.1(6 seconds) in configTest no timeout (old behavior):
model_cache_keep_alive_min: 0in configTest during active use:
Merge Plan
N/A - Additive change with sensible defaults. The 5-minute default enables automatic memory management while remaining practical for typical workflows.
Checklist
What's Newcopy (if doing a release after this PR)Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.