chore: Document realtime API and add docs to AGENTS.md

richiejp · richiejp · commit afc60f04f6e3 · 2026-01-26T11:39:58.000Z
Signed-off-by: Richard Palethorpe &lt;io@richiejp.com&gt;
diff --git a/AGENTS.md b/AGENTS.md
@@ -280,3 +280,11 @@ Always check `llama.cpp` for new model configuration options that should be supp
 - `llama.cpp/common/chat-parser.cpp` - Format presets and model-specific handlers
 - `llama.cpp/common/chat.h` - Format enums and parameter structures
 - `llama.cpp/tools/server/server-context.cpp` - Server configuration options
+
+# Documentation
+
+The project documentation is located in `docs/content`. When adding new features or changing existing functionality, it is crucial to update the documentation to reflect these changes. This helps users understand how to use the new capabilities and ensures the documentation stays relevant.
+
+- **Feature Documentation**: If you add a new feature (like a new backend or API endpoint), create a new markdown file in `docs/content/features/` explaining what it is, how to configure it, and how to use it.
+- **Configuration**: If you modify configuration options, update the relevant sections in `docs/content/`.
+- **Examples**: providing concrete examples (like YAML configuration blocks) is highly encouraged to help users get started quickly.
diff --git a/docs/content/advanced/model-configuration.md b/docs/content/advanced/model-configuration.md
@@ -476,7 +476,7 @@ reasoning:
 
 ## Pipeline Configuration
 
-Define pipelines for audio-to-audio processing:
+Define pipelines for audio-to-audio processing and the [Realtime API]({{%relref "features/openai-realtime" %}}):
 
 | Field | Type | Description |
 |-------|------|-------------|
diff --git a/docs/content/features/_index.en.md b/docs/content/features/_index.en.md
@@ -20,6 +20,7 @@ LocalAI provides a comprehensive set of features for running AI models locally.
 ## Advanced Features
 
 - **[OpenAI Functions](openai-functions/)** - Use function calling and tools API with local models
+- **[Realtime API](openai-realtime/)** - Low-latency multi-modal conversations (voice+text) over WebSocket
 - **[Constrained Grammars](constrained_grammars/)** - Control model output format with BNF grammars
 - **[GPU Acceleration](GPU-acceleration/)** - Optimize performance with GPU support
 - **[Distributed Inference](distributed_inferencing/)** - Scale inference across multiple nodes
diff --git a/docs/content/features/openai-realtime.md b/docs/content/features/openai-realtime.md
@@ -0,0 +1,42 @@
+
+---
+title: "Realtime API"
+weight: 60
+---
+
+# Realtime API
+
+LocalAI supports the [OpenAI Realtime API](https://platform.openai.com/docs/guides/realtime) which enables low-latency, multi-modal conversations (voice and text) over WebSocket.
+
+To use the Realtime API, you need to configure a pipeline model that defines the components for Voice Activity Detection (VAD), Transcription (STT), Language Model (LLM), and Text-to-Speech (TTS).
+
+## Configuration
+
+Create a model configuration file (e.g., `gpt-realtime.yaml`) in your models directory. For a complete reference of configuration options, see [Model Configuration]({{%relref "advanced/model-configuration" %}}).
+
+```yaml
+name: gpt-realtime
+pipeline:
+  vad: silero-vad-ggml
+  transcription: whisper-large-turbo
+  llm: qwen3-4b
+  tts: tts-1
+```
+
+This configuration links the following components:
+- **vad**: The Voice Activity Detection model (e.g., `silero-vad-ggml`) to detect when the user is speaking.
+- **transcription**: The Speech-to-Text model (e.g., `whisper-large-turbo`) to transcribe user audio.
+- **llm**: The Large Language Model (e.g., `qwen3-4b`) to generate responses.
+- **tts**: The Text-to-Speech model (e.g., `tts-1`) to synthesize the audio response.
+
+Make sure all referenced models (`silero-vad-ggml`, `whisper-large-turbo`, `qwen3-4b`, `tts-1`) are also installed or defined in your LocalAI instance.
+
+## Usage
+
+Once configured, you can connect to the Realtime API endpoint via WebSocket:
+
+```
+ws://localhost:8080/v1/realtime?model=gpt-realtime
+```
+
+The API follows the OpenAI Realtime API protocol for handling sessions, audio buffers, and conversation items.