feat: Add Free RPC to backend.proto for VRAM cleanup by localai-bot · Pull Request #8751 · mudler/LocalAI

localai-bot · 2026-03-03T03:05:13Z

Add rpc Free(HealthMessage) returns (Result) {} to backend.proto
This RPC is required to properly expose the Free() method through the gRPC interface for VRAM resource cleanup
Addresses reviewer feedback on PR Fix VRAM not freed when stopping models #8739 requesting additions to backend.proto
Closes Fix VRAM not freed when stopping models #8739 review comments

- Add Free() method to AIModel interface for proper GPU resource cleanup - Implement Free() in llama backend to release llama.cpp model resources - Add Free() stub implementations in base and SingleThread backends - Modify deleteProcess() to call Free() before stopping the process to ensure VRAM is properly released when models are unloaded Fixes issue where VRAM was not freed when stopping models, which could lead to memory exhaustion when running multiple models sequentially.

…e(HealthMessage) returns (Result) {} to backend.proto\n- This RPC is required to properly expose the Free() method\n through the gRPC interface for VRAM resource cleanup\n\nRefs: PR mudler#8739

netlify · 2026-03-03T03:05:17Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`797f4e0`
🔍 Latest deploy log	https://app.netlify.com/projects/localai/deploys/69a69c6797c9e70008a7ac7d
😎 Deploy Preview	https://deploy-preview-8751--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

pkg/grpc/interface.go

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

@mudler

* fix: Add VRAM cleanup when stopping models - Add Free() method to AIModel interface for proper GPU resource cleanup - Implement Free() in llama backend to release llama.cpp model resources - Add Free() stub implementations in base and SingleThread backends - Modify deleteProcess() to call Free() before stopping the process to ensure VRAM is properly released when models are unloaded Fixes issue where VRAM was not freed when stopping models, which could lead to memory exhaustion when running multiple models sequentially. * feat: Add Free RPC to backend.proto for VRAM cleanup\n\n- Add rpc Free(HealthMessage) returns (Result) {} to backend.proto\n- This RPC is required to properly expose the Free() method\n through the gRPC interface for VRAM resource cleanup\n\nRefs: PR mudler#8739 * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: localai-bot <localai-bot@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

@mudler

* fix: Add VRAM cleanup when stopping models - Add Free() method to AIModel interface for proper GPU resource cleanup - Implement Free() in llama backend to release llama.cpp model resources - Add Free() stub implementations in base and SingleThread backends - Modify deleteProcess() to call Free() before stopping the process to ensure VRAM is properly released when models are unloaded Fixes issue where VRAM was not freed when stopping models, which could lead to memory exhaustion when running multiple models sequentially. * feat: Add Free RPC to backend.proto for VRAM cleanup\n\n- Add rpc Free(HealthMessage) returns (Result) {} to backend.proto\n- This RPC is required to properly expose the Free() method\n through the gRPC interface for VRAM resource cleanup\n\nRefs: PR mudler#8739 * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: localai-bot <localai-bot@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

@mudler

* fix: Add VRAM cleanup when stopping models - Add Free() method to AIModel interface for proper GPU resource cleanup - Implement Free() in llama backend to release llama.cpp model resources - Add Free() stub implementations in base and SingleThread backends - Modify deleteProcess() to call Free() before stopping the process to ensure VRAM is properly released when models are unloaded Fixes issue where VRAM was not freed when stopping models, which could lead to memory exhaustion when running multiple models sequentially. * feat: Add Free RPC to backend.proto for VRAM cleanup\n\n- Add rpc Free(HealthMessage) returns (Result) {} to backend.proto\n- This RPC is required to properly expose the Free() method\n through the gRPC interface for VRAM resource cleanup\n\nRefs: PR mudler#8739 * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: localai-bot <localai-bot@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

localai-bot added 2 commits March 2, 2026 22:43

feat: Add Free RPC to backend.proto for VRAM cleanup\n\n- Add rpc Fre…

d8d1797

…e(HealthMessage) returns (Result) {} to backend.proto\n- This RPC is required to properly expose the Free() method\n through the gRPC interface for VRAM resource cleanup\n\nRefs: PR mudler#8739

localai-bot mentioned this pull request Mar 3, 2026

Fix VRAM not freed when stopping models #8739

Closed

mudler reviewed Mar 3, 2026

View reviewed changes

pkg/grpc/interface.go Outdated Show resolved Hide resolved

Apply suggestion from @mudler

797f4e0

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

mudler merged commit 6e5a58c into mudler:master Mar 3, 2026
31 of 33 checks passed

mudler added the enhancement New feature or request label Mar 14, 2026

BrewTestBot mentioned this pull request Mar 14, 2026

localai 4.0.0 Homebrew/homebrew-core#272330

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Free RPC to backend.proto for VRAM cleanup#8751

feat: Add Free RPC to backend.proto for VRAM cleanup#8751
mudler merged 3 commits intomudler:masterfrom
localai-bot:fix-vram-cleanup

localai-bot commented Mar 3, 2026

Uh oh!

netlify bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented Mar 3, 2026

Uh oh!

netlify bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for localai ready!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

netlify bot commented Mar 3, 2026 •

edited

Loading