server: router mode subprocess still occupies GPU after sleep-idle-seconds

### Feature Description

In router mode, when `--sleep-idle-seconds` triggers, the child subprocess unloads the model from VRAM but the process remains alive and attached to the GPU, consuming ~600MiB per idle subprocess:

```
# Active
467282 dev 0 Compute  0% 10386MiB 11% 824MiB llama-server ...

# After sleep-idle triggers — process still on GPU
467282 dev 0 Compute N/A   614MiB  1% 369MiB llama-server ...
```

### Motivation

Idle subprocesses should not remain as GPU processes when they are not needed. With multiple models in router mode, the residual ~600MiB per dormant process wastes significant VRAM.

### Relation to #18189

Follow-up to #18189. PR #18228 implemented `--sleep-idle-seconds`, but it only unloads the model within the living process — it does not terminate the subprocess. The original request was closed as stale without this being addressed.

### Possible Implementation

A new option (e.g. `--stop-idle-seconds`) that triggers full subprocess termination in router mode via the existing `unload()` path. The building blocks are already there:

- `server_queue` already tracks idle time
- `server_models::unload()` already handles graceful shutdown → force-kill
- These two just need to be wired together, with the router re-spawning the process on the next request (same as `--models-max` LRU eviction already does)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: router mode subprocess still occupies GPU after sleep-idle-seconds #19379

Feature Description

Motivation

Relation to #18189

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

server: router mode subprocess still occupies GPU after sleep-idle-seconds #19379

Description

Feature Description

Motivation

Relation to #18189

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions