Power save mode for server --unload-timeout 120

# Feature Description

Would it be possible to set a --unload-timeout flag in "server" mode after that llama.cpp unload the model and free the GPU VRAM, so that it saves power. After a new request it will start to load the model automatically again and waiting for "timeout" period and in wich is no new api call then it will unload again.

# Motivation
My GPU needs a lot of power when a model is loaded in RAM and it is waiting for new api task. 
For example a NVIDIA P40 24GB needs 9W if nothing is loaded to VRAM.
When you use VRAM with some Bytes the power consumption increses to 50W. But still the GPU is not calculating.

RTX 3060 12GB standby power is 6W => 14W with unused loaded Model and "server" waits for api access

# Possible Implementation
Example:
server --unload-timeout 120 
after 120s with no api task the model will be unloaded to save energy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Power save mode for server --unload-timeout 120 #4598

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Power save mode for server --unload-timeout 120 #4598

Description

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions