Skip to content

feat: add a local endpoint type for inference directly from chat-ui#1778

Merged
nsarrazin merged 12 commits into
mainfrom
feat/local_endpoint_type
Apr 4, 2025
Merged

feat: add a local endpoint type for inference directly from chat-ui#1778
nsarrazin merged 12 commits into
mainfrom
feat/local_endpoint_type

Conversation

@nsarrazin

@nsarrazin nsarrazin commented Mar 31, 2025

Copy link
Copy Markdown
Contributor

Part of #1774

  • Run models locally from .gguf file
  • Auto-download model if not stored locally
  • Use GPU if available
  • Get chat template from .gguf file
  • Show every .gguf in models/ as a model if MODELS is undefined
  • Handle batching & multiple model inference at once more gracefully

@nsarrazin nsarrazin added enhancement New feature or request back This issue is related to the Svelte backend or the DB models This issue is related to model performance/reliability labels Mar 31, 2025
@nsarrazin

nsarrazin commented Apr 1, 2025

Copy link
Copy Markdown
Contributor Author

Something is going wrong in the build step... Found this relevant issue, trying to fix

@nsarrazin

Copy link
Copy Markdown
Contributor Author

Works well you can do something like

MODELS=`[{
  "name": "HuggingFaceTB/SmolLM2-1.7B-Instruct",
  "parameters": {
    "stop_sequences": ["<|im_end|>", "<|endoftext|>"]
  },
  "endpoints": [{"type": "local", "modelPath": "hf:HuggingFaceTB/SmolLM2-1.7B-Instruct-GGUF:Q4_K_M"}]
}]`

it will automatically use your GPU if available and download models to the models folder if not available locally.

It's still super rough as it doesn't handle running out of memory gracefully so I'm still working on dealing with this better.

I also want to automatically expose any .gguf files in the models/ folder as a model in chat-ui without having to set the MODELS env var

@nsarrazin

Copy link
Copy Markdown
Contributor Author

Merging for now, it works well in local testing! Will update the docs to explain this when I'm done with the quick setup.

@nsarrazin nsarrazin merged commit 4906793 into main Apr 4, 2025
@nsarrazin nsarrazin deleted the feat/local_endpoint_type branch April 4, 2025 13:11
csanz91 pushed a commit to csanz91/chat-ui that referenced this pull request Apr 7, 2025
…uggingface#1778)

* feat: add a local endpoint type running llama.cpp from chat-ui

* fix: build image

* fix: lock file

* wip: try to make it more reliable

* feat: load chat template from .gguf file

* feat: load gguf models from `models/` folder

* fix: default config

* feat: make endpoint use chatSession instead of completion

* refactor: improve exit handling, exit immediately on second sinal

* fix: various fixes to improve reliability when calling multiple models at once

* docs: add instructions for adding .gguf files to the models directory
csanz91 pushed a commit to csanz91/chat-ui that referenced this pull request Apr 24, 2025
…uggingface#1778)

* feat: add a local endpoint type running llama.cpp from chat-ui

* fix: build image

* fix: lock file

* wip: try to make it more reliable

* feat: load chat template from .gguf file

* feat: load gguf models from `models/` folder

* fix: default config

* feat: make endpoint use chatSession instead of completion

* refactor: improve exit handling, exit immediately on second sinal

* fix: various fixes to improve reliability when calling multiple models at once

* docs: add instructions for adding .gguf files to the models directory
maksym-work pushed a commit to siilats/chat-ui that referenced this pull request Jul 2, 2025
…uggingface#1778)

* feat: add a local endpoint type running llama.cpp from chat-ui

* fix: build image

* fix: lock file

* wip: try to make it more reliable

* feat: load chat template from .gguf file

* feat: load gguf models from `models/` folder

* fix: default config

* feat: make endpoint use chatSession instead of completion

* refactor: improve exit handling, exit immediately on second sinal

* fix: various fixes to improve reliability when calling multiple models at once

* docs: add instructions for adding .gguf files to the models directory
Matsenas pushed a commit to Matsenas/chat-ui that referenced this pull request Jul 4, 2025
…uggingface#1778)

* feat: add a local endpoint type running llama.cpp from chat-ui

* fix: build image

* fix: lock file

* wip: try to make it more reliable

* feat: load chat template from .gguf file

* feat: load gguf models from `models/` folder

* fix: default config

* feat: make endpoint use chatSession instead of completion

* refactor: improve exit handling, exit immediately on second sinal

* fix: various fixes to improve reliability when calling multiple models at once

* docs: add instructions for adding .gguf files to the models directory
Matsenas pushed a commit to Matsenas/chat-ui that referenced this pull request Jul 4, 2025
…uggingface#1778)

* feat: add a local endpoint type running llama.cpp from chat-ui

* fix: build image

* fix: lock file

* wip: try to make it more reliable

* feat: load chat template from .gguf file

* feat: load gguf models from `models/` folder

* fix: default config

* feat: make endpoint use chatSession instead of completion

* refactor: improve exit handling, exit immediately on second sinal

* fix: various fixes to improve reliability when calling multiple models at once

* docs: add instructions for adding .gguf files to the models directory
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

back This issue is related to the Svelte backend or the DB enhancement New feature or request models This issue is related to model performance/reliability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant