Skip to content

feat: Add Free RPC to backend.proto for VRAM cleanup#8751

Merged
mudler merged 3 commits intomudler:masterfrom
localai-bot:fix-vram-cleanup
Mar 3, 2026
Merged

feat: Add Free RPC to backend.proto for VRAM cleanup#8751
mudler merged 3 commits intomudler:masterfrom
localai-bot:fix-vram-cleanup

Conversation

@localai-bot
Copy link
Copy Markdown
Contributor

- Add Free() method to AIModel interface for proper GPU resource cleanup
- Implement Free() in llama backend to release llama.cpp model resources
- Add Free() stub implementations in base and SingleThread backends
- Modify deleteProcess() to call Free() before stopping the process
  to ensure VRAM is properly released when models are unloaded

Fixes issue where VRAM was not freed when stopping models, which
could lead to memory exhaustion when running multiple models
sequentially.
…e(HealthMessage) returns (Result) {} to backend.proto\n- This RPC is required to properly expose the Free() method\n through the gRPC interface for VRAM resource cleanup\n\nRefs: PR mudler#8739
@netlify
Copy link
Copy Markdown

netlify bot commented Mar 3, 2026

Deploy Preview for localai ready!

Name Link
🔨 Latest commit 797f4e0
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/69a69c6797c9e70008a7ac7d
😎 Deploy Preview https://deploy-preview-8751--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
@mudler mudler merged commit 6e5a58c into mudler:master Mar 3, 2026
31 of 33 checks passed
localai-bot added a commit to localai-bot/LocalAI that referenced this pull request Mar 6, 2026
* fix: Add VRAM cleanup when stopping models

- Add Free() method to AIModel interface for proper GPU resource cleanup
- Implement Free() in llama backend to release llama.cpp model resources
- Add Free() stub implementations in base and SingleThread backends
- Modify deleteProcess() to call Free() before stopping the process
  to ensure VRAM is properly released when models are unloaded

Fixes issue where VRAM was not freed when stopping models, which
could lead to memory exhaustion when running multiple models
sequentially.

* feat: Add Free RPC to backend.proto for VRAM cleanup\n\n- Add rpc Free(HealthMessage) returns (Result) {} to backend.proto\n- This RPC is required to properly expose the Free() method\n  through the gRPC interface for VRAM resource cleanup\n\nRefs: PR mudler#8739

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
localai-bot added a commit to localai-bot/LocalAI that referenced this pull request Mar 6, 2026
* fix: Add VRAM cleanup when stopping models

- Add Free() method to AIModel interface for proper GPU resource cleanup
- Implement Free() in llama backend to release llama.cpp model resources
- Add Free() stub implementations in base and SingleThread backends
- Modify deleteProcess() to call Free() before stopping the process
  to ensure VRAM is properly released when models are unloaded

Fixes issue where VRAM was not freed when stopping models, which
could lead to memory exhaustion when running multiple models
sequentially.

* feat: Add Free RPC to backend.proto for VRAM cleanup\n\n- Add rpc Free(HealthMessage) returns (Result) {} to backend.proto\n- This RPC is required to properly expose the Free() method\n  through the gRPC interface for VRAM resource cleanup\n\nRefs: PR mudler#8739

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
@mudler mudler added the enhancement New feature or request label Mar 14, 2026
localai-bot added a commit to localai-bot/LocalAI that referenced this pull request Mar 25, 2026
* fix: Add VRAM cleanup when stopping models

- Add Free() method to AIModel interface for proper GPU resource cleanup
- Implement Free() in llama backend to release llama.cpp model resources
- Add Free() stub implementations in base and SingleThread backends
- Modify deleteProcess() to call Free() before stopping the process
  to ensure VRAM is properly released when models are unloaded

Fixes issue where VRAM was not freed when stopping models, which
could lead to memory exhaustion when running multiple models
sequentially.

* feat: Add Free RPC to backend.proto for VRAM cleanup\n\n- Add rpc Free(HealthMessage) returns (Result) {} to backend.proto\n- This RPC is required to properly expose the Free() method\n  through the gRPC interface for VRAM resource cleanup\n\nRefs: PR mudler#8739

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants