OSA - Open Speech Assistant

A real-time voice assistant powered by LiveKit, with switchable STT providers (Deepgram or Groq Whisper), Groq LLM, and ElevenLabs TTS.

Architecture

                           VOICE PIPELINE
    ┌─────────────────────────────────────────────────────────────┐
    │                                                             │
    │  ┌──────────┐    ┌──────────────┐    ┌─────────────────┐   │
    │  │  User    │───▶│   LiveKit    │───▶│  Python Agent   │   │
    │  │ Browser  │◀───│    Cloud     │◀───│  (OSA Worker)   │   │
    │  └──────────┘    └──────────────┘    └─────────────────┘   │
    │       │                                      │              │
    │       │            ┌─────────────────────────┘              │
    │       │            │                                        │
    │       │            ▼                                        │
    │       │     ┌────────────┐                                  │
    │       └────▶│ Go Backend │                                  │
    │             └────────────┘                                  │
    │                    │                                        │
    │     ┌──────────────┼──────────────┐                         │
    │     ▼              ▼              ▼                         │
    │  ┌──────┐    ┌──────────┐   ┌───────────┐                   │
    │  │ Groq │    │ Deepgram │   │ ElevenLabs│                   │
    │  │ LLM  │    │   STT    │   │    TTS    │                   │
    │  └──────┘    └──────────┘   └───────────┘                   │
    │                  OR                                         │
    │             ┌──────────┐                                    │
    │             │  Groq    │                                    │
    │             │ Whisper  │                                    │
    │             └──────────┘                                    │
    └─────────────────────────────────────────────────────────────┘

Features

Dual STT Support: Switch between Deepgram and Groq Whisper STT in the UI
Real-time Voice: Sub-second latency voice conversations
Live Transcripts: See both user and agent transcripts in real-time
Source Indicator: UI shows which STT provider is active
Personality: OSA has a warm, enthusiastic personality with emotions
Auto-cleanup: Rooms automatically close when users disconnect

Quick Start

1. Clone and Configure

git clone https://github.com/robertohluna/LiveKitVoiceAgent.git
cd LiveKitVoiceAgent
cp .env.example .env
# Edit .env with your API keys

2. Start All Services

# Terminal 1: Go Backend
cd backend && go run ./cmd/server

# Terminal 2: Python Agents (both)
cd agent
source venv/bin/activate
python agent.py dev &
python agent_groq.py dev &

# Terminal 3: Frontend
cd frontend && npm install && npm run dev

3. Use

Open http://localhost:5173
Select STT provider (Deepgram or Groq Whisper)
Click Connect
Start talking!

Agent Configurations

Two STT Options

Feature	Deepgram STT	Groq Whisper
Agent File	`agent.py`	`agent_groq.py`
STT Provider	Deepgram Nova	Groq Whisper
LLM Provider	Groq (via Go Backend)	Groq (via Go Backend)
Latency	~200-400ms	~300-500ms
Accuracy	Excellent	Very Good
Cost	Pay per minute	Included with Groq

Both agents use the same:

LLM: Groq llama-3.3-70b-versatile (via Go Backend)
TTS: ElevenLabs
VAD: Silero

Console Output

When connected, the console shows which STT is active:

[DEEPGRAM] user: Hello there
[DEEPGRAM] agent: Oh that's exciting, it's great to meet you!

or

[GROQ-WHISPER] user: Hello there
[GROQ-WHISPER] agent: Oh that's exciting, it's great to meet you!

Environment Variables

# LiveKit (required)
LIVEKIT_API_KEY=your_key
LIVEKIT_API_SECRET=your_secret
LIVEKIT_URL=wss://your-project.livekit.cloud

# AI Services (required)
GROQ_API_KEY=your_groq_key
DEEPGRAM_API_KEY=your_deepgram_key
ELEVENLABS_API_KEY=your_elevenlabs_key
ELEVENLABS_VOICE_ID=optional_voice_id

Project Structure

LiveKitVoiceAgent/
├── frontend/                 # Svelte frontend
│   ├── src/lib/
│   │   ├── livekit.ts       # LiveKit client wrapper
│   │   └── components/
│   │       └── VoiceAgent.svelte
│   └── package.json
│
├── backend/                  # Go backend
│   ├── cmd/server/main.go   # Entry point
│   └── internal/
│       ├── handler/         # HTTP handlers
│       ├── groq/            # Groq API client
│       └── config/          # Environment config
│
├── agent/                    # Python agents
│   ├── agent.py             # Deepgram STT agent
│   ├── agent_groq.py        # Groq Whisper STT agent
│   └── requirements.txt
│
├── docs/                     # Documentation
│   ├── API.md               # API reference
│   └── TROUBLESHOOTING.md   # Common issues
│
├── .env.example             # Environment template
└── README.md

API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check
`/api/token`	POST	Get LiveKit room token + dispatch agent
`/api/room/delete`	POST	Delete room (cleanup)
`/api/chat`	POST	Send message to Groq LLM

Token Request

POST /api/token
{
  "room_name": "voice-abc123",
  "participant_name": "user",
  "agent_name": "deepgram-agent"  // or "groq-agent"
}

How Agent Switching Works

Frontend: User selects STT provider from toggle
Token Request: Frontend sends agent_name to backend
Agent Dispatch: Backend dispatches specific agent via LiveKit API
Agent Filter: Each agent only accepts jobs with its name
Connection: Only the selected agent joins the room

# In agent.py
async def request_fnc(req: JobRequest):
    if req.agent_name != "deepgram-agent":
        await req.reject()  # Reject if not for us
        return
    await req.accept()

Technical Details

Custom LLM Integration

Both agents use a custom GoBackendLLM class that:

Converts LiveKit chat context to messages
Calls Go backend /api/chat endpoint
Sends transcript to frontend via data channel
Returns response to TTS for speech synthesis

class GoBackendLLM(llm.LLM):
    def chat(self, *, chat_ctx, **kwargs):
        messages = self._convert_context(chat_ctx)
        return GoBackendLLMStream(messages)

Transcript Flow

Agent Response → GoBackendLLM._run() → Callback → publish_data()
                      ↓
              Frontend receives {"type": "transcript", "role": "agent",
                                "text": "...", "source": "deepgram"}

Troubleshooting

Agent not responding

Check agent logs for "registered worker" message
Verify Go backend is running on :8080
Check API keys in .env

No audio

Check browser microphone permissions
Ensure ElevenLabs voice ID is valid
Check agent logs for TTS errors

Duplicate agents

Agents now filter by name, should not happen
If stuck, restart both agents

Wrong agent connecting

Make sure both agents are running
Check agent dispatch logs in Go backend

Dependencies

Python

livekit-agents>=1.3.11
livekit-plugins-deepgram
livekit-plugins-groq
livekit-plugins-elevenlabs
livekit-plugins-silero
aiohttp
python-dotenv

Go

github.com/livekit/protocol
github.com/livekit/server-sdk-go
github.com/joho/godotenv

Frontend

livekit-client
svelte
tailwindcss

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agent		agent
backend		backend
docs		docs
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
README.md		README.md
docker-compose.yml		docker-compose.yml

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

OSA - Open Speech Assistant

Architecture

Features

Quick Start

1. Clone and Configure

2. Start All Services

3. Use

Agent Configurations

Two STT Options

Console Output

Environment Variables

Project Structure

API Endpoints

Token Request

How Agent Switching Works

Technical Details

Custom LLM Integration

Transcript Flow

Troubleshooting

Agent not responding

No audio

Duplicate agents

Wrong agent connecting

Dependencies

Python

Go

Frontend

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages