Related Projects: [RunAndRead-iOS] | [RunAndRead-Android]
RunAndRead-Audiobook is an open-source project aimed at generating high-quality text-to-speech (TTS) audiobooks using open-source models like Zyphra/Zonos.
The ultimate goal is to make Run & Read, the audiobook player app, sound more natural by using high-quality voices. Currently, it relies on the standard voices embedded in Apple and Android devices, which are still not perfect. Starting from Android v1.5 (6) and iOS v1.6 (18), Run & Read supports MP3 audiobooks generated using the RANDR pipeline in this repository. See instructions here.
Apps
App Store: Run & Read for Apple Devices
Google Play: Run & Read for Android
QR codes
---Generate high-quality audiobooks at home using open-source AI models! Weβve built a pipeline using MLX-AUDIO to create audiobooks in the RANDR format, optimized for playback in the Run & Read app.
Dedicated document with step-by-step instructions
- Pipeline for generating audiobooks compatible with the Run & Read app.
- Convert EPUB to JSON for text extraction.
- Generate audio using Zonos TTS or Kokoro-TTS (AUDIO-MLX).
- Clone voices from an MP3 sample.
- Play audio clips sequentially while displaying text in the terminal.
- Merge audio clips into one file.
- Zyphra and Deepgram API support for cloud-based TTS.
- Wrap produced audio and JSON files into a ZIP readable by the Run & Read app.
- Transfer audio files to a mobile phone and play them in the Run & Read app.
Planned
- Estimate local vs. cloud generation cost.
- On-device TTS for Android/iOS.
Here are some audiobook samples generated using RunAndRead-Audiobook with Zonos TTS voice cloning:
[Sample 1 - Alice in Wonderland]
You can find examples under the audio/pg11/ folder, and generate your own samples using the steps outlined in the Usage section below.
- Python 3.9+
- Zyphra/Zonos (open-source TTS engine)
- ffmpeg (audio conversion)
- EbookLib (EPUB parsing)
- PyAudio /
playsound(for playback) - yt-dlp (to download MP3 files from YouTube for voice cloning)
pip install -r requirements.txtFollow the official installation instructions from Zyphra/Zonos. Using a uv virtual
environment is recommended for running RunAndRead scripts. After installing the Zonos project, run the sample.py
script:
uv run sample.pyThis will download the "Zyphra/Zonos-v0.1-transformer" base model from Hugging Face and store it in your environment.
- macOS:
brew install ffmpeg - Ubuntu:
sudo apt install ffmpeg - Windows: Download from ffmpeg.org and add to system PATH.
To train a Zonos voice clone, you'll need an MP3 sample of the speaker. A 10-20 minute video with a single
speaker (e.g., a tutorial or audiobook) is recommended. You can download an MP3 track from YouTube using yt-dlp:
yt-dlp -x --audio-format mp3 "https://www.youtube.com/watch?v=MkLBNUMc26Y" -o "assets/exampleaudio.mp3"This exampleaudio.mp3 file will be used by the Zonos model to fine-tune the voice sample before actual synthesis.
First, run this script with 0 as the third parameter:
python epub_to_json.py epub/pg11.epub library/pg11.json 0Check the terminal output to find how many lines should be skipped, then rerun the script with the number of the first line to keep:
python epub_to_json.py epub/pg11.epub library/pg11.json 10This ensures that the book starts from the correct position, e.g.:
10: CHAPTER I. Down the Rabbit-Hole
Note: Without an NVIDIA GPU, converting an entire book to audio takes a long time. A 30-second audio clip
takes approximately 3 minutes to generate on macbook pro, m1. A full book can take dozens of hours. For example,
Aliceβs Adventures in Wonderland is 3 hours long, meaning 18 hours of processing on a MacBook Pro with an M1
processor. However, the make_abook script can be interrupted at any time, and it will resume from the position where
it was stopped.
uv run python make_abook.py library/pg21279.json assets/kurt_v.mp3python play_audio.py audio/pg11 mp3python merge_audio_clips.py library/pg11.json audio/pg11 mp3# YouTube
ffmpeg -loop 1 -i assets/ic_launcher.png -i audio/pg11/merged_output.mp3 -c:v libx264 -c:a aac -b:a 192k -shortest output.mp4 # LinkedIn
ffmpeg -loop 1 -i appGoogle.png -i merged_output.mp3 -vf "scale=1080:1080,format=yuv420p" -c:v libx264 -tune stillimage -c:a aac -b:a 192k -shortest output.mp4
# X
ffmpeg -loop 1 -i appGoogle.png -i merged_output.mp3 -vf "scale=1080:1080,format=yuv420p" -c:v libx264 -tune stillimage -c:a aac -b:a 192k -pix_fmt yuv420p -shortest output.mp4
# Zyphra
export ZYPHRA_API_KEY="your-zyphra-api-key"
python zyphra_api.py library/pg11.json# Deepgram
export DEEPGRAM_API_KEY="your-deepgram-api-key"
python deepgram_api.py library/pg11.json# OpenAI MINI TTS
export OPENAI_API_KEY="your-open-api-key"
python make_abook_open_ai.py library/pg11.jsonpip install -e ~/projects/voice/mlx-audioNote: Kokoro-82M TTS model skips names and other out-of-dictionary (OOD) words due to its reliance on an external grapheme-to-phoneme (g2p) conversion tool called espeak-ng2. This behavior occurs when espeak-ng is not properly installed or detected by the system.
To prevent Kokoro-82M from skipping names and OOD words, you need to install espeak-ng
echo 'export ESPEAK_DATA_PATH=/opt/homebrew/share/espeak-ng-data' >> ~/.zshrc
source ~/.zshrc
# make audio book
python make_abook_mlx.py library/pg2680.json python make_randr.py audio/pg20203/flowchart LR
A[EPUB] --> B[epub_to_json.py]
B --> C[JSON book]
C --> D[make_abook.py / make_abook_mlx.py / make_abook_open_ai.py / zyphra_api.py / deepgram_api.py]
D --> E[Audio clips]
E --> F[play_audio.py]
E --> G[merge_audio_clips.py]
C --> H[make_randr.py]
E --> H
H --> I[RANDR zip]
runandread-audiobook/
βββ epub_to_json.py # Extracts text from EPUB into JSON
βββ make_abook.py # Converts text into audio files with Zonos TTS
βββ make_abook_mlx.py # Converts text into audio files using the Kokoro-82M TTS model with mlx-audio (optimized for Apple M-series processors).
βββ make_randr.py # Wrap the produced audio and JSON files into a ZIP file readable by the Run & Read app.
βββ play_audio.py # Play audio clips sequentially while displaying text
βββ merge_audio_clips.py # Merges audio files into one and generates a timestamped JSON file
βββ word_tokens_tools.py # Utility to normalize the text before pass it to the TTS
βββ test_scan_next.py # Unit tests to make sure text normalization works as expected
βββ zyphra_api.py # Converts text into audio files with Zyphra SDK/Rest API API
βββ deepgram_api.py # Converts text into audio files with Deepgram SDK/Rest API API
βββ make_abook_open_ai.py# Converts text into audio files with OpenAI TTS
βββ assets/ # Stores MP3 files for voice cloning
βββ epub/ # EPUB books from the Gutenberg Project
βββ audio/ # Output audio files
βββ audiobooks/ # RANDR audiobooks samples
βββ pg2680.randr # Meditations by Emperor of Rome Marcus Aurelius
βββ pg20203.randr # Autobiography of Benjamin Franklin
βββ library/ # Output JSON book files
βββ README.md # Documentation
βββ requirements.txt # Dependencies
βββ LICENSE # Open-source license
Contributions are welcome! Feel free to open an issue or submit a pull request.
- Zonos - Open-source TTS model.
- AUDIO-MLX - A TTS and STS library built on Apple's MLX framework.
- Kokoro-TTS - An open-weight TTS model with 82 million parameters.
- Deepgram - Commercial cloud-based TTS.
- EbookLib - EPUB parsing in Python.
- yt-dlp - YouTube audio downloader for voice cloning.
- Gutenberg Project - A library of over 75,000 free eBooks.
- Python Simplified, MariyaSha - Python Simplified. Kudos to Mariya for her beautiful voice that I did clone from one of her videos.
- Sergey N - Connect and follow me on LinkedIn.
This project is open-source and available under the MIT License.


