Add TTSKit with Qwen3-TTS support by ZachNagengast · Pull Request #425 · argmaxinc/WhisperKit

ZachNagengast · 2026-02-19T11:27:59Z

WhisperKit is expanding into text-to-speech!

TTSKit adds a new library for on-device text-to-speech using Core ML-accelerated Qwen3-TTS models (CustomVoice 0.6B and 1.7B in this first release) with real-time streaming playback on Apple Silicon. In this first PR, we're introducing the library into the WhisperKit package (WhisperKit will be renamed to reflect the new multi-Kit nature of Argmax Open-source SDK) as an optional import to add real-time TTS capabilities with a state-of-the-art open-source model, either on its own or as a complement to WhisperKit speech-to-text.

This PR is still in the final phases of development, but here are a few highlights:

TTSKit Library

Download, load, generate, and stream playback in ~3 lines of code.
Protocol-based component architecture (6 swappable Core ML components: TextProjecting, CodeEmbedding, MultiCodeEmbedding, CodeDecoding, MultiCodeDecoding, SpeechDecoding) for plugging in new model backends.
Qwen3-TTS implementation with 9 built-in voices, 10 languages, and style instruction support (1.7b variant only).
Automatic text chunking for long-form generations with concurrent chunk generation and cross-fade stitching.
Adaptive streaming playback (TTSPlaybackStrategy.auto) that measures first-step latency to pre-buffer just enough audio.
Seedable RNG for reproducible generation.
WAV and M4A (AAC) audio export

Example usage playing audio in real-time out of the default speaker:

    let ttsKit = try await TTSKit()
    try await ttsKit.playSpeech(text: "Hello from TTSKit!")

New target: ArgmaxCore

Extracted a shared target with various utilities from WhisperKit so TTSKit can share them without depending on it directly

CLI

For now we plan to deploy this as a new command on whisperkit-cli tts that can be used like this:
- swift run whisperkit-cli tts --text "Hello from TTSKit" --play
- Full control over speaker, language, model, style instruction, temperature, chunking, compute units, and seed.

TTSKit Example app

macOS and iOS example app with model management, real-time waveform visualization, generation history persisted as M4A files, and more. Use this as a quick way to try it out!

Roadmap

We plan to continue to add support for state-of-the-art models and improve inference latency for TTSKit over the next few weeks. The immediate follow-up is the voice cloning feature from Qwen3-TTS and a 2x reduction in time-to-first-byte (TTFB) so this on-device project achieves a consistent sub-100 ms, providing a latency edge over cloud deployments of the same model. In the meantime, we encourage anyone reading this to check out this PR, give it a spin, and let us know how it goes!

chen-argmax · 2026-02-19T22:13:35Z

Examples/TTS/SpeakAX/SpeakAX/ViewModel.swift

+
+@MainActor
+@Observable
+final class ViewModel: @unchecked Sendable {


I would break this down to smaller viewmodels if it goes too long, e.g DownloadViewModel vs. TTSViewModel

chen-argmax · 2026-02-19T22:17:16Z

...ples/WhisperAX/WhisperAX.xcodeproj/project.xcworkspace/xcshareddata/swiftpm/Package.resolved

@@ -1,5 +1,5 @@
 {


why do we need to change this?

Had an old swift-transformers resolved

chen-argmax · 2026-02-19T22:21:36Z

Sources/ArgmaxCore/ConcurrencyUtilities.swift

+/// Thin wrapper around `os_unfair_lock` that exposes a Swift-friendly
+/// `withLock` helper. This lock is non-reentrant and optimized for low
+/// contention, matching the semantics of Core Foundation's unfair lock.
+public final class UnfairLock: @unchecked Sendable {


I think we want to make this class name generic for future proof with swift6, seems os_unfair_lock is not the recommended way to lock in swift 6.
probably rename it Mutext so we can reimp it with actual Swift.Mutext later
now

public final class Mutex: @unchecked Sendable { private let lock = OSAllocatedUnfairLock() public init() {} @inlinable public func withLock<T>(_ body: () throws -> T) rethrows -> T { try lock.withLock(body) } } later

public final class Mutex: Sendable {
private let mutex: Swift.Mutex

public init(_ value: Value) { self.mutex = Mutex(value) } public func withLock<T>(_ body: (inout Value) throws -> T) rethrows -> T { try mutex.withLock(body) }

}

chen-argmax · 2026-02-19T22:23:32Z

Sources/ArgmaxCore/ConcurrencyUtilities.swift

@@ -0,0 +1,86 @@
+//  For licensing see accompanying LICENSE.md file.
+//  Copyright © 2024 Argmax, Inc. All rights reserved.


2026, ditto to others

This was brought over from https://github.com/argmaxinc/WhisperKit/blob/main/Sources/WhisperKit/Utilities/Concurrency.swift

chen-argmax · 2026-02-19T22:24:20Z

Sources/ArgmaxCore/MLModelExtensions.swift

should we consider adding another package under ArgmaxCore? like ArgmaxCore/CoreML

chen-argmax · 2026-02-19T22:36:10Z

Sources/TTSKit/TTSKit.swift

+    ///
+    /// Downloads only the files matching the configured component variants.
+    /// Files are cached locally by the Hub library.
+    open class func download(


should we decouple model download from TTSKit? ArgmaxCore could provide a downloader for this

Yep have some todos relating to this

chen-argmax · 2026-02-19T22:36:56Z

Sources/TTSKit/TTSModels.swift

+//  Copyright © 2026 Argmax, Inc. All rights reserved.
+
+import Accelerate
+@_exported import ArgmaxCore


why @_exported?

chen-argmax · 2026-02-19T22:38:55Z

Tests/TTSKitTests/TTSKitIntegrationTests.swift

+        )
+
+        XCTAssertGreaterThan(result.audio.count, 0, "Audio samples should be non-empty")
+        XCTAssertGreaterThan(result.audioDuration, 1.0, "Expect at least 1s of speech")


will seed guarantee the audio length is always deterministic?

Yup, apple docs recommend using this method https://developer.apple.com/documentation/swift/randomnumbergenerator#Conforming-to-the-RandomNumberGenerator-Protocol

chen-argmax · 2026-02-19T22:44:25Z

Tests/TTSKitTests/TTSKitUnitTests.swift

+//  For licensing see accompanying LICENSE.md file.
+//  Copyright © 2024 Argmax, Inc. All rights reserved.
+
+import ArgmaxCore


I think we would want to break these test down to isolated class test.

e.g1 TTSKitTest.swift that injects a Config with mocked components, and verify
TTSKitTest.generateSpeech interacts with the components correctly, tasks created etc.

e.g2 Qwen3TTSGenerateTaskTest.swfit that inejcts mocked components. verify run interacts with them correctly

chen-argmax · 2026-02-19T22:45:35Z

Sources/TTSKit/Qwen3TTS/Qwen3TTSGenerateTask.swift

+/// owns its own sampler (derived seed) so concurrent tasks don't share RNG state.
+/// Model components are shared read-only references - `MLModel.prediction()` is
+/// thread-safe. The class is `@unchecked Sendable` to permit `open` subclassing.
+open class TTSGenerateTask: @unchecked Sendable, TTSGenerating {


Should the class be renamed to Qwen3TTSGenerateTask ? ditto to other files under Qwen3TTS

Add TTSKit with Qwen3-TTS support

ba8475c

ZachNagengast requested review from a2they, atiorh and chen-argmax February 19, 2026 11:27

chen-argmax requested changes Feb 19, 2026

View reviewed changes

argmaxinc deleted a comment from chen-argmax Feb 19, 2026

		@@ -0,0 +1,86 @@
		// For licensing see accompanying LICENSE.md file.
		// Copyright © 2024 Argmax, Inc. All rights reserved.

Conversation

ZachNagengast commented Feb 19, 2026 • edited by atiorh Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TTSKit Library

New target: ArgmaxCore

CLI

TTSKit Example app

Roadmap

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

ZachNagengast commented Feb 19, 2026 •

edited by atiorh

Loading