[ Windows ] Windows STT Solution Analysis

# Windows STT Solution Analysis (2026-02-02)

## Summary

We successfully tested microphone input (STT) on native Windows. The solution uses **LiveKit transport**, which bypasses the Windows temp file locking bug entirely.

## Test Results

```
Input:  "Go ahead, I'm listening for 5 seconds."
Output: "I am saying something with life-keyed transport. Does it work?"
Timing: ttfa 0.9s, gen 0.9s, play 3.5s, record 5.1s, stt 0.3s, total 10.2s
STT Provider: whisper-cpp
```

**Key observation:** We did NOT explicitly set `transport="livekit"`. The default `transport="auto"` detected LiveKit running on port 7880 and used it automatically.

## Architecture Clarification

```
┌──────────────────────────────────────────────────────────────────┐
│                    AUDIO CAPTURE (transport)                      │
├──────────────────────────────────────────────────────────────────┤
│                                                                    │
│  transport="local"              transport="livekit"               │
│  ┌─────────────────┐            ┌─────────────────┐               │
│  │ Microphone      │            │ Microphone      │               │
│  │      ↓          │            │      ↓          │               │
│  │ Temp WAV file   │ ← BUG!     │ WebRTC stream   │ ← NO BUG      │
│  │      ↓          │            │      ↓          │               │
│  │ WinError 32     │            │ LiveKit Server  │               │
│  └─────────────────┘            │      ↓          │               │
│                                 │ Audio to MCP    │               │
│                                 └─────────────────┘               │
└──────────────────────────────────────────────────────────────────┘
                                        ↓
┌──────────────────────────────────────────────────────────────────┐
│                    STT PROCESSING (separate from transport)       │
├──────────────────────────────────────────────────────────────────┤
│                                                                    │
│  VoiceMode MCP → Whisper Server (port 2022)                       │
│                                                                    │
│  Endpoint options:                                                 │
│  • /v1/audio/transcriptions  (OpenAI-compatible)                  │
│  • /inference                (native whisper.cpp)                 │
│                                                                    │
└──────────────────────────────────────────────────────────────────┘
```

## Questions Answered

### 1. Do we need the Whisper endpoint changes from PR #233?

**Answer: It depends on your Whisper server build.**

| Whisper Server Type | Exposes `/v1/audio/transcriptions` | Needs PR #233 whisper changes |
|---------------------|-----------------------------------|-------------------------------|
| whisper.cpp (vanilla) | No (only `/inference`) | **YES** |
| whisper.cpp with OpenAI API | Yes | No |
| Faster-Whisper | Yes | No |
| OpenAI API | Yes | No |

**Our setup:** The STT worked, showing `(STT: whisper-cpp)`. This means either:
1. We're using a whisper.cpp build that exposes OpenAI-compatible endpoints, OR
2. We have the fork with whisper endpoint fixes installed

**Check your installation:**
```bash
# If you installed from the fork with fixes:
pip show voice-mode | grep Location
# Check if it points to your fork directory

# Test which endpoint your whisper uses:
curl http://localhost:2022/v1/audio/transcriptions -F file=@test.wav
curl http://localhost:2022/inference -F file=@test.wav
```

### 2. Do we need to explicitly use LiveKit instead of local transport?

**Answer: On Windows, YES - but `transport="auto"` handles this automatically.**

| Scenario | Recommendation |
|----------|---------------|
| LiveKit running (port 7880) | `transport="auto"` (default) - auto-selects LiveKit |
| LiveKit NOT running | `transport="local"` - will hit WinError 32 bug |
| Force LiveKit | `transport="livekit"` - explicit, fails if not running |

**Best practice:** Just ensure LiveKit is running, and the default `transport="auto"` will use it.

## PR #233 Component Analysis

| Component | What it fixes | Needed with LiveKit? |
|-----------|--------------|---------------------|
| **fcntl → msvcrt** | File locking in `conch.py` | **Maybe** - Conch is used for multi-agent coordination, not audio capture. If you use `wait_for_conch=true`, you need this fix. |
| **whisper /inference endpoint** | STT to whisper.cpp | **Depends** - Only if your whisper.cpp doesn't expose OpenAI-compatible endpoints |

## Recommendations

### Minimum for Windows STT to work:
1. Run LiveKit server (`C:\voicemode\start-livekit.bat`)
2. Run Whisper server (`C:\voicemode\start-whisper.bat`)
3. Use default `transport="auto"`

### For complete Windows support (future-proofing):
1. Support PR #233 fcntl changes (needed for `wait_for_conch` multi-agent)
2. Consider whisper endpoint changes as optional (config-based, not sequential probing)

## Updated CLAUDE.md Recommendations

The current CLAUDE.md suggests using `transport="livekit"` explicitly. This can be simplified:

```markdown
## Voice on Windows

Services must be running:
- LiveKit: port 7880 (required for Windows mic input)
- Whisper: port 2022 (STT)
- Kokoro: port 8880 (TTS)

No special parameters needed - default `transport="auto"` detects LiveKit.
```

## Related Issues/PRs

- **PR #233**: Native Windows Support (fcntl + whisper.cpp)
  - fcntl changes: Good to merge
  - whisper endpoint: Maintainers suggest config option over sequential probing
- **Issue #98**: WSL Audio Choppy - LiveKit bypasses this entirely
- **Issue #135**: Windows temp file locking - LiveKit bypasses this entirely

## Conclusion

LiveKit transport is the correct solution for Windows. The PR #233 changes are complementary:
- **fcntl fix**: Still useful for edge cases (multi-agent conch)
- **whisper endpoint**: Only needed if using vanilla whisper.cpp without OpenAI-compatible endpoints

The fact that our test worked without any special configuration suggests the current setup is correct.


Component	What it fixes	Needed with LiveKit?
fcntl → msvcrt	File locking in `conch.py`	Maybe - Conch is used for multi-agent coordination, not audio capture. If you use `wait_for_conch=true`, you need this fix.
whisper /inference endpoint	STT to whisper.cpp	Depends - Only if your whisper.cpp doesn't expose OpenAI-compatible endpoints

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ Windows ] Windows STT Solution Analysis #239

Windows STT Solution Analysis (2026-02-02)

Summary

Test Results

Architecture Clarification

Questions Answered

1. Do we need the Whisper endpoint changes from PR #233?

2. Do we need to explicitly use LiveKit instead of local transport?

PR #233 Component Analysis

Recommendations

Minimum for Windows STT to work:

For complete Windows support (future-proofing):

Updated CLAUDE.md Recommendations

Related Issues/PRs

Conclusion

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Whisper Server Type	Exposes `/v1/audio/transcriptions`	Needs PR #233 whisper changes
whisper.cpp (vanilla)	No (only `/inference`)	YES
whisper.cpp with OpenAI API	Yes	No
Faster-Whisper	Yes	No
OpenAI API	Yes	No

Scenario	Recommendation
LiveKit running (port 7880)	`transport="auto"` (default) - auto-selects LiveKit
LiveKit NOT running	`transport="local"` - will hit WinError 32 bug
Force LiveKit	`transport="livekit"` - explicit, fails if not running

[ Windows ] Windows STT Solution Analysis #239

Description

Windows STT Solution Analysis (2026-02-02)

Summary

Test Results

Architecture Clarification

Questions Answered

1. Do we need the Whisper endpoint changes from PR #233?

2. Do we need to explicitly use LiveKit instead of local transport?

PR #233 Component Analysis

Recommendations

Minimum for Windows STT to work:

For complete Windows support (future-proofing):

Updated CLAUDE.md Recommendations

Related Issues/PRs

Conclusion

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions