Address possible memory leak during model sleep/unload by glaziermag · Pull Request #2030 · EricLBuehler/mistral.rs

glaziermag · 2026-03-25T22:53:51Z

Tentatively addresses #545.

Problem

After calling /v1/sleep (model unload), VRAM was not fully returned to the OS. The likely cause: tensors held in RebootState were dropped on the HTTP handler thread, which does not have the CUDA OS context bound to it. cuMemFreeAsync requires the context to be current on the calling thread; without this binding, deallocations silently fail at the driver level and the memory pool retains the allocations.

Fix

Three steps, in order:

1. Join the engine worker thread before dropping RebootState, ensuring the async engine has fully exited and released its own tensor references before the HTTP thread attempts the drop.

2. Bind the CUDA context to the HTTP thread (dev.cuda_stream().context().bind_to_thread()) before drop(reboot_state). This ensures cuMemFreeAsync has a valid context to execute against.

3. After the drop, call device.synchronize() to flush any in-flight async frees, then trim the CUDA default memory pool to its currently-used watermark to release the idle reserve back to the OS.

Pool trim approach

cuMemPoolTrimTo(pool, 0) (trim everything) is avoided because in a multi-model server the default memory pool is shared. Trimming to zero would evict blocks held idle by other still-active models, forcing expensive OS reallocations. Instead the trim queries CU_MEMPOOL_ATTR_USED_MEM_CURRENT to get the active-use watermark and trims only to that value. If the query fails (old driver), the trim is skipped entirely.

Clean branch

The original branch (fix-memory-leak-unload-v4) was based on the fork's master rather than origin/master, so its diff includes ~200 unrelated upstream files. A clean branch containing only this commit has been pushed as fix-memory-leak-unload-clean on the same fork.

Files changed

mistralrs-core/src/lib.rs
mistralrs-server-core/src/handlers.rs
mistralrs-server-core/src/mistralrs_server_router_builder.rs

…model unload

glaziermag · 2026-04-15T22:37:16Z

Update (2026-04-15): The original branch (fix-memory-leak-unload-v4) is rebased on the fork's master, not upstream, so the diff shows ~200 unrelated files. A clean isolated branch (fix-memory-leak-unload-clean) has been pushed that cherry-picks only the single commit onto current origin/master.

Additionally, cuMemPoolTrimTo(pool, 0) has been replaced with a safer alternative:

Old (aggressive):

sys::cuMemPoolTrimTo(pool, 0);  // releases ALL cached capacity — also evicts other models' idle blocks

New (targeted):

// Query currently-used bytes first
let attr_ok = sys::cuMemPoolGetAttribute(pool, CU_MEMPOOL_ATTR_USED_MEM_CURRENT, &mut used_bytes ...);
if attr_ok == CUDA_SUCCESS {
    sys::cuMemPoolTrimTo(pool, used_bytes as usize);  // releases idle reserve only
}
// If attribute query fails, skip trim — OS reclaims via pool refcount eviction

Trimming to 0 in a multi-model server evicts blocks that other still-active models may be about to reuse, forcing expensive re-allocations from the OS and potential OOM. Trimming to USED_MEM_CURRENT releases only the idle over-reserve from the unloaded model without touching live allocations.

fix(core): Synchronize Engine Thread and Driver Context Guard during …

8aa88a9

…model unload

glaziermag marked this pull request as ready for review March 25, 2026 23:00

glaziermag changed the title ~~Draft: Address possible memory leak during model sleep/unload~~ Address possible memory leak during model sleep/unload Mar 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address possible memory leak during model sleep/unload#2030

Address possible memory leak during model sleep/unload#2030
glaziermag wants to merge 1 commit intoEricLBuehler:masterfrom
glaziermag:fix-memory-leak-unload-v4

glaziermag commented Mar 25, 2026 •

edited

Loading

Uh oh!

glaziermag commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

glaziermag commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Pool trim approach

Clean branch

Files changed

Uh oh!

glaziermag commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

glaziermag commented Mar 25, 2026 •

edited

Loading