A real-time voice assistant powered by LiveKit, with switchable STT providers (Deepgram or Groq Whisper), Groq LLM, and ElevenLabs TTS.
VOICE PIPELINE
┌─────────────────────────────────────────────────────────────┐
│ │
│ ┌──────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ User │───▶│ LiveKit │───▶│ Python Agent │ │
│ │ Browser │◀───│ Cloud │◀───│ (OSA Worker) │ │
│ └──────────┘ └──────────────┘ └─────────────────┘ │
│ │ │ │
│ │ ┌─────────────────────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌────────────┐ │
│ └────▶│ Go Backend │ │
│ └────────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────┐ ┌──────────┐ ┌───────────┐ │
│ │ Groq │ │ Deepgram │ │ ElevenLabs│ │
│ │ LLM │ │ STT │ │ TTS │ │
│ └──────┘ └──────────┘ └───────────┘ │
│ OR │
│ ┌──────────┐ │
│ │ Groq │ │
│ │ Whisper │ │
│ └──────────┘ │
└─────────────────────────────────────────────────────────────┘
- Dual STT Support: Switch between Deepgram and Groq Whisper STT in the UI
- Real-time Voice: Sub-second latency voice conversations
- Live Transcripts: See both user and agent transcripts in real-time
- Source Indicator: UI shows which STT provider is active
- Personality: OSA has a warm, enthusiastic personality with emotions
- Auto-cleanup: Rooms automatically close when users disconnect
git clone https://github.com/robertohluna/LiveKitVoiceAgent.git
cd LiveKitVoiceAgent
cp .env.example .env
# Edit .env with your API keys# Terminal 1: Go Backend
cd backend && go run ./cmd/server
# Terminal 2: Python Agents (both)
cd agent
source venv/bin/activate
python agent.py dev &
python agent_groq.py dev &
# Terminal 3: Frontend
cd frontend && npm install && npm run dev- Open http://localhost:5173
- Select STT provider (Deepgram or Groq Whisper)
- Click Connect
- Start talking!
| Feature | Deepgram STT | Groq Whisper |
|---|---|---|
| Agent File | agent.py |
agent_groq.py |
| STT Provider | Deepgram Nova | Groq Whisper |
| LLM Provider | Groq (via Go Backend) | Groq (via Go Backend) |
| Latency | ~200-400ms | ~300-500ms |
| Accuracy | Excellent | Very Good |
| Cost | Pay per minute | Included with Groq |
Both agents use the same:
- LLM: Groq llama-3.3-70b-versatile (via Go Backend)
- TTS: ElevenLabs
- VAD: Silero
When connected, the console shows which STT is active:
[DEEPGRAM] user: Hello there
[DEEPGRAM] agent: Oh that's exciting, it's great to meet you!
or
[GROQ-WHISPER] user: Hello there
[GROQ-WHISPER] agent: Oh that's exciting, it's great to meet you!
# LiveKit (required)
LIVEKIT_API_KEY=your_key
LIVEKIT_API_SECRET=your_secret
LIVEKIT_URL=wss://your-project.livekit.cloud
# AI Services (required)
GROQ_API_KEY=your_groq_key
DEEPGRAM_API_KEY=your_deepgram_key
ELEVENLABS_API_KEY=your_elevenlabs_key
ELEVENLABS_VOICE_ID=optional_voice_idLiveKitVoiceAgent/
├── frontend/ # Svelte frontend
│ ├── src/lib/
│ │ ├── livekit.ts # LiveKit client wrapper
│ │ └── components/
│ │ └── VoiceAgent.svelte
│ └── package.json
│
├── backend/ # Go backend
│ ├── cmd/server/main.go # Entry point
│ └── internal/
│ ├── handler/ # HTTP handlers
│ ├── groq/ # Groq API client
│ └── config/ # Environment config
│
├── agent/ # Python agents
│ ├── agent.py # Deepgram STT agent
│ ├── agent_groq.py # Groq Whisper STT agent
│ └── requirements.txt
│
├── docs/ # Documentation
│ ├── API.md # API reference
│ └── TROUBLESHOOTING.md # Common issues
│
├── .env.example # Environment template
└── README.md
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/api/token |
POST | Get LiveKit room token + dispatch agent |
/api/room/delete |
POST | Delete room (cleanup) |
/api/chat |
POST | Send message to Groq LLM |
POST /api/token
{
"room_name": "voice-abc123",
"participant_name": "user",
"agent_name": "deepgram-agent" // or "groq-agent"
}- Frontend: User selects STT provider from toggle
- Token Request: Frontend sends
agent_nameto backend - Agent Dispatch: Backend dispatches specific agent via LiveKit API
- Agent Filter: Each agent only accepts jobs with its name
- Connection: Only the selected agent joins the room
# In agent.py
async def request_fnc(req: JobRequest):
if req.agent_name != "deepgram-agent":
await req.reject() # Reject if not for us
return
await req.accept()Both agents use a custom GoBackendLLM class that:
- Converts LiveKit chat context to messages
- Calls Go backend
/api/chatendpoint - Sends transcript to frontend via data channel
- Returns response to TTS for speech synthesis
class GoBackendLLM(llm.LLM):
def chat(self, *, chat_ctx, **kwargs):
messages = self._convert_context(chat_ctx)
return GoBackendLLMStream(messages)Agent Response → GoBackendLLM._run() → Callback → publish_data()
↓
Frontend receives {"type": "transcript", "role": "agent",
"text": "...", "source": "deepgram"}
- Check agent logs for "registered worker" message
- Verify Go backend is running on :8080
- Check API keys in .env
- Check browser microphone permissions
- Ensure ElevenLabs voice ID is valid
- Check agent logs for TTS errors
- Agents now filter by name, should not happen
- If stuck, restart both agents
- Make sure both agents are running
- Check agent dispatch logs in Go backend
livekit-agents>=1.3.11
livekit-plugins-deepgram
livekit-plugins-groq
livekit-plugins-elevenlabs
livekit-plugins-silero
aiohttp
python-dotenv
github.com/livekit/protocol
github.com/livekit/server-sdk-go
github.com/joho/godotenv
livekit-client
svelte
tailwindcss
MIT