Skip to content

Miosa-osa/LiveKitVoiceAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OSA - Open Speech Assistant

A real-time voice assistant powered by LiveKit, with switchable STT providers (Deepgram or Groq Whisper), Groq LLM, and ElevenLabs TTS.

Architecture

                           VOICE PIPELINE
    ┌─────────────────────────────────────────────────────────────┐
    │                                                             │
    │  ┌──────────┐    ┌──────────────┐    ┌─────────────────┐   │
    │  │  User    │───▶│   LiveKit    │───▶│  Python Agent   │   │
    │  │ Browser  │◀───│    Cloud     │◀───│  (OSA Worker)   │   │
    │  └──────────┘    └──────────────┘    └─────────────────┘   │
    │       │                                      │              │
    │       │            ┌─────────────────────────┘              │
    │       │            │                                        │
    │       │            ▼                                        │
    │       │     ┌────────────┐                                  │
    │       └────▶│ Go Backend │                                  │
    │             └────────────┘                                  │
    │                    │                                        │
    │     ┌──────────────┼──────────────┐                         │
    │     ▼              ▼              ▼                         │
    │  ┌──────┐    ┌──────────┐   ┌───────────┐                   │
    │  │ Groq │    │ Deepgram │   │ ElevenLabs│                   │
    │  │ LLM  │    │   STT    │   │    TTS    │                   │
    │  └──────┘    └──────────┘   └───────────┘                   │
    │                  OR                                         │
    │             ┌──────────┐                                    │
    │             │  Groq    │                                    │
    │             │ Whisper  │                                    │
    │             └──────────┘                                    │
    └─────────────────────────────────────────────────────────────┘

Features

  • Dual STT Support: Switch between Deepgram and Groq Whisper STT in the UI
  • Real-time Voice: Sub-second latency voice conversations
  • Live Transcripts: See both user and agent transcripts in real-time
  • Source Indicator: UI shows which STT provider is active
  • Personality: OSA has a warm, enthusiastic personality with emotions
  • Auto-cleanup: Rooms automatically close when users disconnect

Quick Start

1. Clone and Configure

git clone https://github.com/robertohluna/LiveKitVoiceAgent.git
cd LiveKitVoiceAgent
cp .env.example .env
# Edit .env with your API keys

2. Start All Services

# Terminal 1: Go Backend
cd backend && go run ./cmd/server

# Terminal 2: Python Agents (both)
cd agent
source venv/bin/activate
python agent.py dev &
python agent_groq.py dev &

# Terminal 3: Frontend
cd frontend && npm install && npm run dev

3. Use

  1. Open http://localhost:5173
  2. Select STT provider (Deepgram or Groq Whisper)
  3. Click Connect
  4. Start talking!

Agent Configurations

Two STT Options

Feature Deepgram STT Groq Whisper
Agent File agent.py agent_groq.py
STT Provider Deepgram Nova Groq Whisper
LLM Provider Groq (via Go Backend) Groq (via Go Backend)
Latency ~200-400ms ~300-500ms
Accuracy Excellent Very Good
Cost Pay per minute Included with Groq

Both agents use the same:

  • LLM: Groq llama-3.3-70b-versatile (via Go Backend)
  • TTS: ElevenLabs
  • VAD: Silero

Console Output

When connected, the console shows which STT is active:

[DEEPGRAM] user: Hello there
[DEEPGRAM] agent: Oh that's exciting, it's great to meet you!

or

[GROQ-WHISPER] user: Hello there
[GROQ-WHISPER] agent: Oh that's exciting, it's great to meet you!

Environment Variables

# LiveKit (required)
LIVEKIT_API_KEY=your_key
LIVEKIT_API_SECRET=your_secret
LIVEKIT_URL=wss://your-project.livekit.cloud

# AI Services (required)
GROQ_API_KEY=your_groq_key
DEEPGRAM_API_KEY=your_deepgram_key
ELEVENLABS_API_KEY=your_elevenlabs_key
ELEVENLABS_VOICE_ID=optional_voice_id

Project Structure

LiveKitVoiceAgent/
├── frontend/                 # Svelte frontend
│   ├── src/lib/
│   │   ├── livekit.ts       # LiveKit client wrapper
│   │   └── components/
│   │       └── VoiceAgent.svelte
│   └── package.json
│
├── backend/                  # Go backend
│   ├── cmd/server/main.go   # Entry point
│   └── internal/
│       ├── handler/         # HTTP handlers
│       ├── groq/            # Groq API client
│       └── config/          # Environment config
│
├── agent/                    # Python agents
│   ├── agent.py             # Deepgram STT agent
│   ├── agent_groq.py        # Groq Whisper STT agent
│   └── requirements.txt
│
├── docs/                     # Documentation
│   ├── API.md               # API reference
│   └── TROUBLESHOOTING.md   # Common issues
│
├── .env.example             # Environment template
└── README.md

API Endpoints

Endpoint Method Description
/health GET Health check
/api/token POST Get LiveKit room token + dispatch agent
/api/room/delete POST Delete room (cleanup)
/api/chat POST Send message to Groq LLM

Token Request

POST /api/token
{
  "room_name": "voice-abc123",
  "participant_name": "user",
  "agent_name": "deepgram-agent"  // or "groq-agent"
}

How Agent Switching Works

  1. Frontend: User selects STT provider from toggle
  2. Token Request: Frontend sends agent_name to backend
  3. Agent Dispatch: Backend dispatches specific agent via LiveKit API
  4. Agent Filter: Each agent only accepts jobs with its name
  5. Connection: Only the selected agent joins the room
# In agent.py
async def request_fnc(req: JobRequest):
    if req.agent_name != "deepgram-agent":
        await req.reject()  # Reject if not for us
        return
    await req.accept()

Technical Details

Custom LLM Integration

Both agents use a custom GoBackendLLM class that:

  1. Converts LiveKit chat context to messages
  2. Calls Go backend /api/chat endpoint
  3. Sends transcript to frontend via data channel
  4. Returns response to TTS for speech synthesis
class GoBackendLLM(llm.LLM):
    def chat(self, *, chat_ctx, **kwargs):
        messages = self._convert_context(chat_ctx)
        return GoBackendLLMStream(messages)

Transcript Flow

Agent Response → GoBackendLLM._run() → Callback → publish_data()
                      ↓
              Frontend receives {"type": "transcript", "role": "agent",
                                "text": "...", "source": "deepgram"}

Troubleshooting

Agent not responding

  • Check agent logs for "registered worker" message
  • Verify Go backend is running on :8080
  • Check API keys in .env

No audio

  • Check browser microphone permissions
  • Ensure ElevenLabs voice ID is valid
  • Check agent logs for TTS errors

Duplicate agents

  • Agents now filter by name, should not happen
  • If stuck, restart both agents

Wrong agent connecting

  • Make sure both agents are running
  • Check agent dispatch logs in Go backend

Dependencies

Python

livekit-agents>=1.3.11
livekit-plugins-deepgram
livekit-plugins-groq
livekit-plugins-elevenlabs
livekit-plugins-silero
aiohttp
python-dotenv

Go

github.com/livekit/protocol
github.com/livekit/server-sdk-go
github.com/joho/godotenv

Frontend

livekit-client
svelte
tailwindcss

License

MIT

About

In Go

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors