This repository hosts an on-device iOS audio ML stack focused on stem separation and local inference.
ComfyAudio is now the primary app target in this repo.
ComfyAudio/primary iOS app project and UI.AudioHelper/shared Swift package (stem separation engine, downloader, TTS, file management, packaged model resources).old/archived legacy modules and app code.old/AudioSplitter/old/AudioUI/old/NO_DOWNLOAD/
scripts/tooling (including model conversion).
This app relies heavily on the following upstream projects:
FluidAudioby FluidInference: https://github.com/FluidInference/FluidAudio Provides core on-device audio/voice inference capabilities used by the app'sAudioHelperpackage.YoutubeDL-iOSby kewlbear: https://github.com/kewlbear/YoutubeDL-iOS Powers the in-app YouTube download workflow used inComfyAudioutilities.
This repository's source code is licensed under LGPL-2.1-or-later.
- See
LICENSE. - Third-party software notices are in
THIRD_PARTY_NOTICES.md. - Third-party model licensing and attribution notes are in
THIRD_PARTY_MODELS.md.
- macOS with Xcode installed
- Xcode Command Line Tools (
xcode-select --install) - iOS Simulator runtime supported by your Xcode version
- Open
ComfyAudio/ComfyAudio.xcodeprojin Xcode. - Select scheme
ComfyAudio. - Select an iOS Simulator device.
- Build and run.
Command-line build:
xcodebuild \
-project ComfyAudio/ComfyAudio.xcodeproj \
-scheme ComfyAudio \
-destination 'generic/platform=iOS Simulator' \
CODE_SIGNING_ALLOWED=NO \
buildThe runtime expects a Core ML model compatible with the UVR/MDX contract.
- Expected model names:
UVRMDXNetStemSeparatorAudioSeparatorDemucsSeparator
- Expected input shape:
[1, 4, 2560, 256](multi-array) - Expected output shape:
[1, 4, 2560, 256](multi-array) - Pipeline reference:
AudioHelper/Sources/AudioHelper/StemSeperation/Core/CoreMLStemSeparatorEngine.swift
Official UVR model downloads are published from:
https://github.com/TRvlvr/model_repo/releases/tag/all_public_uvr_models
Example source model:
UVR-MDX-NET-Inst_HQ_5.onnx
Install conversion dependencies and convert:
python3 -m pip install -r scripts/requirements-model-conversion.txt
python3 scripts/convert_model_to_mlpackage.py \
--source /path/to/UVR-MDX-NET-Inst_HQ_5.onnx \
--output AudioHelper/Sources/AudioHelper/Resources/UVRMDXNet.mlpackage \
--shape 1,4,2560,256 \
--input-name input \
--min-ios iOS17 \
--precision float16TorchScript input also works:
python3 scripts/convert_model_to_mlpackage.py \
--source /path/to/model.ts \
--output AudioHelper/Sources/AudioHelper/Resources/UVRMDXNet.mlpackagexcrun coremlcompiler metadata AudioHelper/Sources/AudioHelper/Resources/UVRMDXNet.mlpackageConfirm:
- One multi-array input with shape
[1, 4, 2560, 256] - One multi-array output with shape
[1, 4, 2560, 256]
Xcode compiles model resources during build. If the model is missing or incompatible, the stem separation layer reports runtime errors.
- Model artifacts are ignored in
.gitignore(*.mlpackage,*.onnx,*.pth,*.ckpt, etc.). - Keep large model binaries out of git history unless you intentionally use Git LFS.
- App code is LGPL-2.1-or-later (
LICENSE). - Third-party software notices and package versions are listed in
THIRD_PARTY_NOTICES.md. - Model files are third-party assets and are not automatically covered by this repo's app code license.
- Verify model redistribution/commercial terms for the exact model you use.
- Include model attribution in public releases.
- See
THIRD_PARTY_MODELS.mdfor a template and links.
This app depends on FFmpeg packages distributed under LGPL-2.1+:
FFmpeg-iOS-LameFFmpeg-iOS-Support
To rebuild with modified FFmpeg:
- Follow upstream build docs in
FFmpeg-iOS-Lame/FFmpeg-iOS-Support. - Produce replacement FFmpeg artifacts.
- Update package references/revisions.
- Rebuild this app from source.
See THIRD_PARTY_NOTICES.md for package versions and source links.
No bundled Core ML model was found:- Ensure
UVRMDXNet.mlpackageexists underAudioHelper/Sources/AudioHelper/Resources/. - Rebuild clean (
Product > Clean Build Folder).
- Ensure
Unsupported model input/output:- Verify model input/output shapes are
[1, 4, 2560, 256]. - Verify model uses multi-array IO.
- Verify model input/output shapes are