Skip to content

Commit afc60f0

Browse files
committed
chore: Document realtime API and add docs to AGENTS.md
Signed-off-by: Richard Palethorpe <io@richiejp.com>
1 parent ee7e393 commit afc60f0

4 files changed

Lines changed: 52 additions & 1 deletion

File tree

AGENTS.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -280,3 +280,11 @@ Always check `llama.cpp` for new model configuration options that should be supp
280280
- `llama.cpp/common/chat-parser.cpp` - Format presets and model-specific handlers
281281
- `llama.cpp/common/chat.h` - Format enums and parameter structures
282282
- `llama.cpp/tools/server/server-context.cpp` - Server configuration options
283+
284+
# Documentation
285+
286+
The project documentation is located in `docs/content`. When adding new features or changing existing functionality, it is crucial to update the documentation to reflect these changes. This helps users understand how to use the new capabilities and ensures the documentation stays relevant.
287+
288+
- **Feature Documentation**: If you add a new feature (like a new backend or API endpoint), create a new markdown file in `docs/content/features/` explaining what it is, how to configure it, and how to use it.
289+
- **Configuration**: If you modify configuration options, update the relevant sections in `docs/content/`.
290+
- **Examples**: providing concrete examples (like YAML configuration blocks) is highly encouraged to help users get started quickly.

docs/content/advanced/model-configuration.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -476,7 +476,7 @@ reasoning:
476476

477477
## Pipeline Configuration
478478

479-
Define pipelines for audio-to-audio processing:
479+
Define pipelines for audio-to-audio processing and the [Realtime API]({{%relref "features/openai-realtime" %}}):
480480

481481
| Field | Type | Description |
482482
|-------|------|-------------|

docs/content/features/_index.en.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ LocalAI provides a comprehensive set of features for running AI models locally.
2020
## Advanced Features
2121

2222
- **[OpenAI Functions](openai-functions/)** - Use function calling and tools API with local models
23+
- **[Realtime API](openai-realtime/)** - Low-latency multi-modal conversations (voice+text) over WebSocket
2324
- **[Constrained Grammars](constrained_grammars/)** - Control model output format with BNF grammars
2425
- **[GPU Acceleration](GPU-acceleration/)** - Optimize performance with GPU support
2526
- **[Distributed Inference](distributed_inferencing/)** - Scale inference across multiple nodes
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
2+
---
3+
title: "Realtime API"
4+
weight: 60
5+
---
6+
7+
# Realtime API
8+
9+
LocalAI supports the [OpenAI Realtime API](https://platform.openai.com/docs/guides/realtime) which enables low-latency, multi-modal conversations (voice and text) over WebSocket.
10+
11+
To use the Realtime API, you need to configure a pipeline model that defines the components for Voice Activity Detection (VAD), Transcription (STT), Language Model (LLM), and Text-to-Speech (TTS).
12+
13+
## Configuration
14+
15+
Create a model configuration file (e.g., `gpt-realtime.yaml`) in your models directory. For a complete reference of configuration options, see [Model Configuration]({{%relref "advanced/model-configuration" %}}).
16+
17+
```yaml
18+
name: gpt-realtime
19+
pipeline:
20+
vad: silero-vad-ggml
21+
transcription: whisper-large-turbo
22+
llm: qwen3-4b
23+
tts: tts-1
24+
```
25+
26+
This configuration links the following components:
27+
- **vad**: The Voice Activity Detection model (e.g., `silero-vad-ggml`) to detect when the user is speaking.
28+
- **transcription**: The Speech-to-Text model (e.g., `whisper-large-turbo`) to transcribe user audio.
29+
- **llm**: The Large Language Model (e.g., `qwen3-4b`) to generate responses.
30+
- **tts**: The Text-to-Speech model (e.g., `tts-1`) to synthesize the audio response.
31+
32+
Make sure all referenced models (`silero-vad-ggml`, `whisper-large-turbo`, `qwen3-4b`, `tts-1`) are also installed or defined in your LocalAI instance.
33+
34+
## Usage
35+
36+
Once configured, you can connect to the Realtime API endpoint via WebSocket:
37+
38+
```
39+
ws://localhost:8080/v1/realtime?model=gpt-realtime
40+
```
41+
42+
The API follows the OpenAI Realtime API protocol for handling sessions, audio buffers, and conversation items.

0 commit comments

Comments
 (0)