Voice Agent with Batch Task Processing

A real-time voice assistant built with LiveKit Agents that can offload complex tasks to a batch processing API. The agent uses Claude for conversational AI and delegates time-intensive tasks (like data processing) to Doubleword's batch API for thorough completion.

Architecture

User Voice Input
       |
       v
+--============--+
| ElevenLabs STT |  Speech-to-Text
+-------+--------+
        |
        v
+--============--+
|   Claude LLM   |  Real-time conversation
+-------+--------+---------------------+
        |                               |
        v                               v  Complex tasks
+--============--+             +--============--+
| ElevenLabs TTS |             | Doubleword     |
+-------+--------+             | Batch API      |
        |                      +-------+--------+
        v                              |
  Voice Output  <----- TTS notify -----+
                   (interrupts current speech
                    when task completes)

Features

Real-time voice conversation using LiveKit Agents framework
Claude-powered responses for natural, intelligent conversation
Batch task offloading - complex tasks (business plans, detailed analysis, long documents) are automatically delegated to Doubleword's batch API
Mid-conversation notifications - the agent interrupts itself to notify you when a batch task completes, then offers to read the results
Deduplication - prevents accidental duplicate task submissions within a 60-second window
Task persistence - results are saved to tasks_results.json and retrievable by job ID

How Async Batch Processing Works

The agent handles complex tasks asynchronously so the user can keep chatting while work happens in the background:

Submission - When the user requests a complex task (e.g., "Create a business plan for a coffee shop"), the LLM recognizes it requires detailed output and calls the submit_batch_task tool. This uploads a JSONL request to the Doubleword batch API and returns immediately.
Background polling - An asyncio task starts polling the batch API every 10 seconds using an AsyncOpenAI client (non-blocking). The user can continue a normal conversation during this time.
Mid-conversation notification - When the batch job completes:
- The poller calls session.interrupt(force=True) to stop whatever the agent is currently saying
- Then calls session.say(...) to directly speak a notification via TTS (bypasses the LLM for reliability)
- The user hears something like: "Your task 'business plan for coffee shop' is ready. Would you like me to read the results?"
Result retrieval - When the user asks for results, the LLM calls get_task_result which reads from the persisted tasks_results.json file and speaks the content.

Prerequisites

Python 3.10+ (required by livekit-agents for TypeAlias support)
LiveKit Cloud account (or self-hosted LiveKit server)
Anthropic API key
ElevenLabs API key
Doubleword API key

Installation

Clone the repository:

git clone https://github.com/yourusername/voice-agent.git
cd voice-agent

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Copy the example environment file and fill in your credentials:

cp .env.example .env

Edit .env with your API keys (see Environment Variables below).

Running the Agent

Start the agent in development mode (auto-reloads on file changes):

python agent.py dev

For production:

python agent.py start

Testing with LiveKit

Once the agent is running, you need a LiveKit room to connect to it. The easiest way:

Go to LiveKit Agents Playground
Enter your LiveKit project URL (the LIVEKIT_URL from your .env)
Click Connect - the playground creates a room and the agent automatically joins
Use your microphone to talk to the agent

The agent logs will show the connection in your terminal:

INFO  livekit.agents  received job request  {"room": "playground-xxxx", ...}

Alternatively, you can connect using any LiveKit client SDK or build your own frontend.

Dev mode tips:

The agent runs each job in a separate subprocess and watches for file changes
Edit any .py file and the agent restarts automatically
Logs include [POLL] prefixed lines showing batch job polling activity

Environment Variables

Variable	Description
`LIVEKIT_URL`	Your LiveKit server URL (e.g., `wss://your-project.livekit.cloud`)
`LIVEKIT_API_KEY`	LiveKit API key
`LIVEKIT_API_SECRET`	LiveKit API secret
`ANTHROPIC_API_KEY`	Anthropic API key for Claude
`ELEVEN_API_KEY`	ElevenLabs API key for STT/TTS
`DOUBLEWORD_API_KEY`	Doubleword API key for batch processing
`BATCH_MODEL`	Model to use for batch tasks (default: `Qwen/Qwen3-VL-235B-A22B-Instruct-FP8`)
`LLM_MODEL`	Claude model for real-time conversation (e.g., `claude-sonnet-4-20250514`)

Available Voice Commands

The agent understands natural language. Examples:

Simple questions: "What's the weather like?" / "Tell me a joke"
Complex tasks: "Create a business plan for a coffee shop" / "Write a detailed marketing strategy"
Task management: "Check my task status" / "What's the result of my last task?"

Project Structure

voice_agent/
├── src/
│   ├── __init__.py        # Package exports
│   ├── assistant.py       # VoiceAssistant class with tool definitions
│   └── batch.py           # Batch API client, polling, task persistence
├── agent.py               # Entry point - session setup and LiveKit worker
├── requirements.txt       # Python dependencies
├── .env                   # Environment variables (not committed)
├── .env.example           # Template for environment variables
├── .gitignore             # Git ignore rules
└── README.md              # This file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Agent with Batch Task Processing

Architecture

Features

How Async Batch Processing Works

Prerequisites

Installation

Running the Agent

Testing with LiveKit

Environment Variables

Available Voice Commands

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Voice Agent with Batch Task Processing

Architecture

Features

How Async Batch Processing Works

Prerequisites

Installation

Running the Agent

Testing with LiveKit

Environment Variables

Available Voice Commands

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages