A real-time voice assistant built with LiveKit Agents that can offload complex tasks to a batch processing API. The agent uses Claude for conversational AI and delegates time-intensive tasks (like data processing) to Doubleword's batch API for thorough completion.
User Voice Input
|
v
+--============--+
| ElevenLabs STT | Speech-to-Text
+-------+--------+
|
v
+--============--+
| Claude LLM | Real-time conversation
+-------+--------+---------------------+
| |
v v Complex tasks
+--============--+ +--============--+
| ElevenLabs TTS | | Doubleword |
+-------+--------+ | Batch API |
| +-------+--------+
v |
Voice Output <----- TTS notify -----+
(interrupts current speech
when task completes)
- Real-time voice conversation using LiveKit Agents framework
- Claude-powered responses for natural, intelligent conversation
- Batch task offloading - complex tasks (business plans, detailed analysis, long documents) are automatically delegated to Doubleword's batch API
- Mid-conversation notifications - the agent interrupts itself to notify you when a batch task completes, then offers to read the results
- Deduplication - prevents accidental duplicate task submissions within a 60-second window
- Task persistence - results are saved to
tasks_results.jsonand retrievable by job ID
The agent handles complex tasks asynchronously so the user can keep chatting while work happens in the background:
-
Submission - When the user requests a complex task (e.g., "Create a business plan for a coffee shop"), the LLM recognizes it requires detailed output and calls the
submit_batch_tasktool. This uploads a JSONL request to the Doubleword batch API and returns immediately. -
Background polling - An
asynciotask starts polling the batch API every 10 seconds using anAsyncOpenAIclient (non-blocking). The user can continue a normal conversation during this time. -
Mid-conversation notification - When the batch job completes:
- The poller calls
session.interrupt(force=True)to stop whatever the agent is currently saying - Then calls
session.say(...)to directly speak a notification via TTS (bypasses the LLM for reliability) - The user hears something like: "Your task 'business plan for coffee shop' is ready. Would you like me to read the results?"
- The poller calls
-
Result retrieval - When the user asks for results, the LLM calls
get_task_resultwhich reads from the persistedtasks_results.jsonfile and speaks the content.
- Python 3.10+ (required by
livekit-agentsforTypeAliassupport) - LiveKit Cloud account (or self-hosted LiveKit server)
- Anthropic API key
- ElevenLabs API key
- Doubleword API key
- Clone the repository:
git clone https://github.com/yourusername/voice-agent.git
cd voice-agent- Create a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Copy the example environment file and fill in your credentials:
cp .env.example .env- Edit
.envwith your API keys (see Environment Variables below).
Start the agent in development mode (auto-reloads on file changes):
python agent.py devFor production:
python agent.py startOnce the agent is running, you need a LiveKit room to connect to it. The easiest way:
- Go to LiveKit Agents Playground
- Enter your LiveKit project URL (the
LIVEKIT_URLfrom your.env) - Click Connect - the playground creates a room and the agent automatically joins
- Use your microphone to talk to the agent
The agent logs will show the connection in your terminal:
INFO livekit.agents received job request {"room": "playground-xxxx", ...}
Alternatively, you can connect using any LiveKit client SDK or build your own frontend.
Dev mode tips:
- The agent runs each job in a separate subprocess and watches for file changes
- Edit any
.pyfile and the agent restarts automatically - Logs include
[POLL]prefixed lines showing batch job polling activity
| Variable | Description |
|---|---|
LIVEKIT_URL |
Your LiveKit server URL (e.g., wss://your-project.livekit.cloud) |
LIVEKIT_API_KEY |
LiveKit API key |
LIVEKIT_API_SECRET |
LiveKit API secret |
ANTHROPIC_API_KEY |
Anthropic API key for Claude |
ELEVEN_API_KEY |
ElevenLabs API key for STT/TTS |
DOUBLEWORD_API_KEY |
Doubleword API key for batch processing |
BATCH_MODEL |
Model to use for batch tasks (default: Qwen/Qwen3-VL-235B-A22B-Instruct-FP8) |
LLM_MODEL |
Claude model for real-time conversation (e.g., claude-sonnet-4-20250514) |
The agent understands natural language. Examples:
- Simple questions: "What's the weather like?" / "Tell me a joke"
- Complex tasks: "Create a business plan for a coffee shop" / "Write a detailed marketing strategy"
- Task management: "Check my task status" / "What's the result of my last task?"
voice_agent/
├── src/
│ ├── __init__.py # Package exports
│ ├── assistant.py # VoiceAssistant class with tool definitions
│ └── batch.py # Batch API client, polling, task persistence
├── agent.py # Entry point - session setup and LiveKit worker
├── requirements.txt # Python dependencies
├── .env # Environment variables (not committed)
├── .env.example # Template for environment variables
├── .gitignore # Git ignore rules
└── README.md # This file