Unity integration plugin for sherpa-onnx — an open-source speech toolkit powered by ONNX Runtime.
| Feature | Description | Status |
|---|---|---|
| Text-to-Speech (TTS) | Offline speech synthesis — VITS, Matcha, Kokoro, Kitten, ZipVoice, Pocket (voice cloning) | ✅ Done |
| Speech Recognition (ASR) | Offline and streaming speech-to-text — Zipformer, Paraformer, Whisper, SenseVoice, Moonshine | ✅ Done |
| Voice Activity Detection (VAD) | Speech/silence segmentation for efficient ASR — Silero VAD, TEN-VAD | ✅ Done |
| Keyword Spotting (KWS) | Lightweight always-on keyword detection from microphone | 📋 Planned |
| Speaker ID & Diarization | Speaker identification by voice, who-spoke-when segmentation | 📋 Planned |
| Audio Tools | Audio tagging, speech enhancement, punctuation restoration, language identification | 📋 Planned |
| Platform | Architectures |
|---|---|
| 🪟 Windows | x64, x86, arm64 |
| 🍎 macOS | x64, arm64 |
| 🐧 Linux | x64, arm64 |
| 🤖 Android | arm64-v8a, armeabi-v7a, x86, x86_64 |
| 📱 iOS | arm64, x86_64-simulator |
Integrating sherpa-onnx into a Unity project normally requires manual native library setup, platform-specific workarounds, and custom C# bindings. This plugin handles all of that out of the box.
- 🔌 One-click library install — open Project Settings, pick a version, click Install. Native libraries for Windows, macOS, Linux, Android, and iOS are downloaded and configured automatically.
- 📥 One-click model import — paste a model URL, the importer downloads, extracts, auto-detects the model type, and creates a ready-to-use profile. No manual config editing.
- 🔄 Update All — change the version number and update every installed platform at once.
The plugin solves real-world platform issues that are not addressed by sherpa-onnx itself. Grouped by where the problem appears:
| Problem | What the plugin does |
|---|---|
| 📦 StreamingAssets locked inside APK | Extracts model files to persistentDataPath on first launch with version tracking and progress reporting. Skips re-extraction on subsequent launches. Streams each file via DownloadHandlerFile directly into the destination — keeps the UI thread responsive on hundred-megabyte models and avoids a heap spike that could OOM on low-memory devices. |
| 🌍 Non-US locale breaks native code | Wraps native calls with a locale guard that temporarily sets LC_NUMERIC to "C", preventing comma-as-decimal crashes in sherpa-onnx's float parsing. |
| Problem | What the plugin does |
|---|---|
| 🍏 No dynamic library loading | Builds a patched sherpa-onnx.dll with DllImport("__Internal") and downloads it automatically during install. |
| ✂️ Xcframework architecture bloat | Filters xcframeworks to only the target architecture (device or simulator) during install. |
| Problem | What the plugin does |
|---|---|
| 🔐 Microphone permission | Async permission request with UniTask — returns false gracefully if denied. iOS waits 1s after the permission dialog so the AVAudioSession can settle before capture starts. |
| 🎧 TTS playback breaks mic capture | AudioSessionBridge switches AVAudioSession between PlayAndRecord/Playback on iOS, and sets AudioManager.MODE_IN_COMMUNICATION + speakerphone on Android — engaging the platform AEC/AGC so the mic does not return near-silence after TTS. Public API for projects that want to drive it manually. |
| 🗣️ Native TTS callbacks unsupported on IL2CPP (also affects IL2CPP Standalone) | sherpa-onnx C# bindings wrap user callbacks in closures that IL2CPP cannot marshal to native. The plugin auto-falls-back GenerateAsync (and other callback-using paths) to the callback-less Generate on IL2CPP so TTS keeps working — and ships a Sentence Queue API (ITtsService.Speak(text, audio, ct, lookAhead)) that delivers the same low-latency long-text experience as native streaming, but in pure C# with lookAhead parallel pre-generation and works on every scripting backend. |
| 🗄️ Disk fills up as users switch profiles | On Android every active profile (Local, Remote, or LocalZip) is extracted lazily into persistentDataPath/SherpaOnnx/{type}-models/{profile}/ and stays cached across sessions. Each model service (ITtsService / IAsrService / IOnlineAsrService / IVadService) implements IModelDiskUsage — host code can GetExtractedProfiles(), GetExtractedProfileSizeBytes(name), TryDeleteExtractedProfile(name) and CleanupUnusedExtractedProfiles() from one place. Optional Keep only active profile on disk toggle in Project Settings sweeps every non-active extraction after each successful InitializeAsync / SwitchProfile — applies to all sources, not just LocalZip. Implied automatically when Only active profile in build is on (the build only ships one profile, so keeping more than one extracted makes no sense). |
| Problem | What the plugin does |
|---|---|
| 🎙️ Microphone not actually recording | Plays a silent AudioSource on the mic clip to force the device to start recording — a known Unity workaround. |
| ⏳ Microphone readiness delay | Polls Microphone.GetPosition() with a configurable timeout before starting capture. |
| 🎵 Sample rate mismatch | Built-in resampler converts any input rate to the model's expected rate (typically 16 kHz). |
⚙️ Microphone settings (sample rate, buffer length, start timeout, resampling mode, audio session management) are configurable via
Edit → Project Settings → Sherpa-ONNX → Microphoneormicrophone-settings.jsonin StreamingAssets.🗄️ Disk-usage settings (auto-delete previous profile on switch) are per-service: TTS / ASR / Online ASR / VAD panels in
Edit → Project Settings → Sherpa-ONNX. Manual API: see "Disk Usage" in the TTS / ASR / VAD docs.
⚠️ After upgrading the plugin from a pre-per-profile-extraction version, runTools → SherpaOnnx → Rebuild StreamingAssets Manifestonce. Old manifests have a flatfileslist and trigger a single full extraction at first launch (the runtime falls back gracefully); the new manifest format is what enables per-profile lazy extraction and per-profile cleanup.
- ⬇️ Download Installer
- 📂 Import installer into Unity project
- Double-click the file — Unity will open it
- OR: Unity Editor → Assets → Import Package → Custom Package, then choose the file
- The installer adds OpenUPM scoped registry and resolves the package automatically
- 📂 Open
Packages/manifest.jsonin your project - ✏️ Add the scoped registry and dependency:
{ "scopedRegistries": [ { "name": "OpenUPM", "url": "https://package.openupm.com", "scopes": [ "com.ponyudev.sherpa-onnx", "com.cysharp.unitask" ] } ], "dependencies": { "com.ponyudev.sherpa-onnx": "0.2.0" } } - ✅ Unity will resolve and download the package automatically
- 📦 Install openupm-cli
▶️ Run the command in your project folder:openupm add com.ponyudev.sherpa-onnx
- ✅ Dependencies are resolved automatically
⚠️ Install UniTask first — open Window → Package Manager, click + → Add package from git URL... and paste:https://github.com/Cysharp/UniTask.git?path=src/UniTask/Assets/Plugins/UniTask- 🔗 Then add Sherpa-ONNX the same way:
https://github.com/Ponyu-dev/Unity-Sherpa-ONNX.git
- Open Edit → Project Settings → Sherpa ONNX
- Set the desired sherpa-onnx version (e.g.
1.12.25) - Click Install for each platform you need
- Use Update All when you change the version to update all installed libraries at once
📥 Libraries are downloaded from:
- Desktop (Windows, macOS, Linux): NuGet
- Android / iOS native: sherpa-onnx GitHub releases
- iOS managed DLL: this repository's GitHub releases (see below)
Offline speech synthesis with pooling and caching. Supports 6 model architectures.
- Open Project Settings > Sherpa-ONNX > TTS
- Click Import from URL and paste a model archive link
- The importer downloads, extracts, and auto-configures the profile
- Select the Active profile to use at runtime
- 🧠 6 model architectures — Vits (Piper), Matcha, Kokoro, Kitten, ZipVoice, Pocket
- 🔍 Auto-detection — model type and paths are configured automatically from the archive
- ⚡ Int8 quantization — one-click switch between normal and int8 models
- 🚀 Flexible deployment — Local (StreamingAssets), Remote (runtime download), or LocalZip (compressed at build time)
- 🎛️ Matcha vocoder selector — choose and download vocoders independently
- ♻️ Cache pooling — configurable pools for audio buffers, AudioClips, and AudioSources
- 🛑 Runtime control — cancellation, handle-based stop / fade-out, parallel-handle StopAll, sentence-queue and streaming playback for long texts
- Models Setup Guide — Editor UI, importing, profiles, deployment options
- Runtime Usage Guide — MonoBehaviour, VContainer, Zenject examples, API reference
Offline file recognition and real-time streaming with microphone. Supports 15 offline and 5 online model architectures.
- Open Project Settings > Sherpa-ONNX > ASR
- Select the Offline or Online tab
- Click Import from URL and paste a model archive link
- The importer downloads, extracts, and auto-configures the profile
- Select the Active profile to use at runtime
- 🧠 15 offline + 5 online architectures — Zipformer, Paraformer, Whisper, SenseVoice, Moonshine, and more
- 🔍 Auto-detection — model type and paths are configured automatically from the archive
- ⚡ Int8 quantization — one-click switch between normal and int8 models
- 🎙️ Streaming recognition — real-time microphone capture with partial and final results
- 🏊 Engine pool — multiple concurrent recognizer instances for offline ASR
- ⏹️ Endpoint detection — configurable silence rules for automatic utterance segmentation
- Models Setup Guide — Editor UI, importing, profiles, offline/online tabs
- Runtime Usage Guide — MonoBehaviour, VContainer, Zenject examples, API reference
Speech/silence segmentation for efficient ASR pipelines. Supports Silero VAD and TEN-VAD models.
- Open Project Settings > Sherpa-ONNX > VAD
- Click Import from URL and paste a model archive link
- The importer downloads, extracts, and auto-configures the profile
- Select the Active profile to use at runtime
- 🧠 2 model architectures — Silero VAD, TEN-VAD
- 🔍 Auto-detection — model type and paths are configured automatically from the archive
- 🎛️ Configurable parameters — threshold, min silence/speech duration, window size
- 🔗 VAD + ASR pipeline — segment audio by voice activity, then recognize each segment
- Models Setup Guide — Editor UI, importing, profiles, configuration
- Runtime Usage Guide — MonoBehaviour, VContainer, Zenject examples, API reference
On desktop and Android, Unity loads native code via dynamic libraries (.dll, .so, .dylib).
The managed C# binding (sherpa-onnx.dll) uses DllImport("sherpa-onnx-c-api") to find them at runtime.
iOS does not support dynamic loading. All native code must be statically linked into the app binary.
This means the managed DLL must use DllImport("__Internal") instead of "sherpa-onnx-c-api".
The upstream sherpa-onnx NuGet package ships with the standard "sherpa-onnx-c-api" binding, which does not work on iOS.
To solve this, the Tools~/ scripts in this repository:
- Take the official C# sources from
sherpa-onnx/scripts/dotnet/ - Patch
Dll.csto replace"sherpa-onnx-c-api"with"__Internal" - Build a custom
sherpa-onnx.dlltargetingnetstandard2.0 - Publish it as a GitHub release with tag
sherpa-v{version}
The plugin's iOS install pipeline downloads this patched DLL automatically.
After installing any library, the plugin automatically adds SHERPA_ONNX to Scripting Define Symbols for all build targets. This allows you to guard runtime code that depends on sherpa-onnx:
#if SHERPA_ONNX
var recognizer = new OnlineRecognizer(config);
#endifThe define is removed automatically when all libraries are uninstalled.
- Unity 2022.3 or later
com.unity.sharp-zip-lib1.4.1+ (added automatically as a dependency)



