Conversation
|
|
||
| @MainActor | ||
| @Observable | ||
| final class ViewModel: @unchecked Sendable { |
There was a problem hiding this comment.
I would break this down to smaller viewmodels if it goes too long, e.g DownloadViewModel vs. TTSViewModel
| @@ -1,5 +1,5 @@ | |||
| { | |||
There was a problem hiding this comment.
why do we need to change this?
There was a problem hiding this comment.
Had an old swift-transformers resolved
| /// Thin wrapper around `os_unfair_lock` that exposes a Swift-friendly | ||
| /// `withLock` helper. This lock is non-reentrant and optimized for low | ||
| /// contention, matching the semantics of Core Foundation's unfair lock. | ||
| public final class UnfairLock: @unchecked Sendable { |
There was a problem hiding this comment.
I think we want to make this class name generic for future proof with swift6, seems os_unfair_lock is not the recommended way to lock in swift 6.
probably rename it Mutext so we can reimp it with actual Swift.Mutext later
now
public final class Mutex: @unchecked Sendable {
private let lock = OSAllocatedUnfairLock()
public init() {}
@inlinable
public func withLock<T>(_ body: () throws -> T) rethrows -> T {
try lock.withLock(body)
}
}
later
public final class Mutex: Sendable {
private let mutex: Swift.Mutex
public init(_ value: Value) {
self.mutex = Mutex(value)
}
public func withLock<T>(_ body: (inout Value) throws -> T) rethrows -> T {
try mutex.withLock(body)
}
}
| @@ -0,0 +1,86 @@ | |||
| // For licensing see accompanying LICENSE.md file. | |||
| // Copyright © 2024 Argmax, Inc. All rights reserved. | |||
There was a problem hiding this comment.
This was brought over from https://github.com/argmaxinc/WhisperKit/blob/main/Sources/WhisperKit/Utilities/Concurrency.swift
There was a problem hiding this comment.
should we consider adding another package under ArgmaxCore? like ArgmaxCore/CoreML
| /// | ||
| /// Downloads only the files matching the configured component variants. | ||
| /// Files are cached locally by the Hub library. | ||
| open class func download( |
There was a problem hiding this comment.
should we decouple model download from TTSKit? ArgmaxCore could provide a downloader for this
There was a problem hiding this comment.
Yep have some todos relating to this
| // Copyright © 2026 Argmax, Inc. All rights reserved. | ||
|
|
||
| import Accelerate | ||
| @_exported import ArgmaxCore |
| ) | ||
|
|
||
| XCTAssertGreaterThan(result.audio.count, 0, "Audio samples should be non-empty") | ||
| XCTAssertGreaterThan(result.audioDuration, 1.0, "Expect at least 1s of speech") |
There was a problem hiding this comment.
will seed guarantee the audio length is always deterministic?
There was a problem hiding this comment.
Yup, apple docs recommend using this method https://developer.apple.com/documentation/swift/randomnumbergenerator#Conforming-to-the-RandomNumberGenerator-Protocol
| // For licensing see accompanying LICENSE.md file. | ||
| // Copyright © 2024 Argmax, Inc. All rights reserved. | ||
|
|
||
| import ArgmaxCore |
There was a problem hiding this comment.
I think we would want to break these test down to isolated class test.
e.g1 TTSKitTest.swift that injects a Config with mocked components, and verify
TTSKitTest.generateSpeech interacts with the components correctly, tasks created etc.
e.g2 Qwen3TTSGenerateTaskTest.swfit that inejcts mocked components. verify run interacts with them correctly
| /// owns its own sampler (derived seed) so concurrent tasks don't share RNG state. | ||
| /// Model components are shared read-only references - `MLModel.prediction()` is | ||
| /// thread-safe. The class is `@unchecked Sendable` to permit `open` subclassing. | ||
| open class TTSGenerateTask: @unchecked Sendable, TTSGenerating { |
There was a problem hiding this comment.
Should the class be renamed to Qwen3TTSGenerateTask ? ditto to other files under Qwen3TTS
WhisperKit is expanding into text-to-speech!
TTSKit adds a new library for on-device text-to-speech using Core ML-accelerated Qwen3-TTS models (CustomVoice 0.6B and 1.7B in this first release) with real-time streaming playback on Apple Silicon. In this first PR, we're introducing the library into the WhisperKit package (WhisperKit will be renamed to reflect the new multi-Kit nature of Argmax Open-source SDK) as an optional import to add real-time TTS capabilities with a state-of-the-art open-source model, either on its own or as a complement to WhisperKit speech-to-text.
This PR is still in the final phases of development, but here are a few highlights:
TTSKit Library
TextProjecting,CodeEmbedding,MultiCodeEmbedding,CodeDecoding,MultiCodeDecoding,SpeechDecoding) for plugging in new model backends.TTSPlaybackStrategy.auto) that measures first-step latency to pre-buffer just enough audio.Example usage playing audio in real-time out of the default speaker:
New target: ArgmaxCore
CLI
ttsthat can be used like this:swift run whisperkit-cli tts --text "Hello from TTSKit" --playTTSKit Example app
Roadmap
We plan to continue to add support for state-of-the-art models and improve inference latency for TTSKit over the next few weeks. The immediate follow-up is the voice cloning feature from Qwen3-TTS and a 2x reduction in time-to-first-byte (TTFB) so this on-device project achieves a consistent sub-100 ms, providing a latency edge over cloud deployments of the same model. In the meantime, we encourage anyone reading this to check out this PR, give it a spin, and let us know how it goes!