RunAndRead-Audiobook-Pipeline

Related Projects: [RunAndRead-iOS] | [RunAndRead-Android]

Overview

RunAndRead-Audiobook is an open-source project aimed at generating high-quality text-to-speech (TTS) audiobooks using open-source models like Zyphra/Zonos.

The ultimate goal is to make Run & Read, the audiobook player app, sound more natural by using high-quality voices. Currently, it relies on the standard voices embedded in Apple and Android devices, which are still not perfect. Starting from Android v1.5 (6) and iOS v1.6 (18), Run & Read supports MP3 audiobooks generated using the RANDR pipeline in this repository. See instructions here.

Apps

App Store: Run & Read for Apple Devices
Google Play: Run & Read for Android

QR codes

---

Create Audiobooks with AI (RANDR format)

Generate high-quality audiobooks at home using open-source AI models! We’ve built a pipeline using MLX-AUDIO to create audiobooks in the RANDR format, optimized for playback in the Run & Read app.

Dedicated document with step-by-step instructions

Features

Pipeline for generating audiobooks compatible with the Run & Read app.
Convert EPUB to JSON for text extraction.
Generate audio using Zonos TTS or Kokoro-TTS (AUDIO-MLX).
Clone voices from an MP3 sample.
Play audio clips sequentially while displaying text in the terminal.
Merge audio clips into one file.
Zyphra and Deepgram API support for cloud-based TTS.
Wrap produced audio and JSON files into a ZIP readable by the Run & Read app.
Transfer audio files to a mobile phone and play them in the Run & Read app.

Planned

Estimate local vs. cloud generation cost.
On-device TTS for Android/iOS.

Audio Samples

Here are some audiobook samples generated using RunAndRead-Audiobook with Zonos TTS voice cloning:

[Sample 1 - Alice in Wonderland]

You can find examples under the audio/pg11/ folder, and generate your own samples using the steps outlined in the Usage section below.

Dependencies & Technologies

Python 3.9+
Zyphra/Zonos (open-source TTS engine)
ffmpeg (audio conversion)
EbookLib (EPUB parsing)
PyAudio / playsound (for playback)
yt-dlp (to download MP3 files from YouTube for voice cloning)

Installation

1) Install Python Dependencies

pip install -r requirements.txt

2) Set Up Zyphra/Zonos

Follow the official installation instructions from Zyphra/Zonos. Using a uv virtual environment is recommended for running RunAndRead scripts. After installing the Zonos project, run the sample.py script:

uv run sample.py

This will download the "Zyphra/Zonos-v0.1-transformer" base model from Hugging Face and store it in your environment.

3) Set Up ffmpeg

macOS: brew install ffmpeg
Ubuntu: sudo apt install ffmpeg
Windows: Download from ffmpeg.org and add to system PATH.

4) Download a Voice Sample from YouTube

To train a Zonos voice clone, you'll need an MP3 sample of the speaker. A 10-20 minute video with a single speaker (e.g., a tutorial or audiobook) is recommended. You can download an MP3 track from YouTube using yt-dlp:

yt-dlp -x --audio-format mp3 "https://www.youtube.com/watch?v=MkLBNUMc26Y" -o "assets/exampleaudio.mp3"

This exampleaudio.mp3 file will be used by the Zonos model to fine-tune the voice sample before actual synthesis.

Usage

Step 1: Convert EPUB to JSON

First, run this script with 0 as the third parameter:

python epub_to_json.py epub/pg11.epub library/pg11.json 0

Check the terminal output to find how many lines should be skipped, then rerun the script with the number of the first line to keep:

python epub_to_json.py epub/pg11.epub library/pg11.json 10

This ensures that the book starts from the correct position, e.g.:

10: CHAPTER I. Down the Rabbit-Hole

Note: Without an NVIDIA GPU, converting an entire book to audio takes a long time. A 30-second audio clip takes approximately 3 minutes to generate on macbook pro, m1. A full book can take dozens of hours. For example, Alice’s Adventures in Wonderland is 3 hours long, meaning 18 hours of processing on a MacBook Pro with an M1 processor. However, the make_abook script can be interrupted at any time, and it will resume from the position where it was stopped.

Step 2: Generate TTS Audio Files

uv run python make_abook.py library/pg21279.json assets/kurt_v.mp3

Step 3: Play Audiobook in CLI

python play_audio.py audio/pg11 mp3

Step 4: Merge a set of audio clips into one audio file

python merge_audio_clips.py library/pg11.json audio/pg11 mp3

Step 5: Prepare audio clip for YouTube/LinkedIn

# YouTube
ffmpeg -loop 1 -i assets/ic_launcher.png -i audio/pg11/merged_output.mp3 -c:v libx264 -c:a aac -b:a 192k -shortest output.mp4

# LinkedIn
ffmpeg -loop 1 -i appGoogle.png -i merged_output.mp3 -vf "scale=1080:1080,format=yuv420p" -c:v libx264 -tune stillimage -c:a aac -b:a 192k -shortest output.mp4

# X
ffmpeg -loop 1 -i appGoogle.png -i merged_output.mp3 -vf "scale=1080:1080,format=yuv420p" -c:v libx264 -tune stillimage -c:a aac -b:a 192k -pix_fmt yuv420p -shortest output.mp4

Step 6: Set up REST Zyphra/Deepgram/OpenAI SDK

# Zyphra
export ZYPHRA_API_KEY="your-zyphra-api-key"
python zyphra_api.py library/pg11.json

# Deepgram
export DEEPGRAM_API_KEY="your-deepgram-api-key"
python deepgram_api.py library/pg11.json

# OpenAI MINI TTS
export OPENAI_API_KEY="your-open-api-key"
python make_abook_open_ai.py library/pg11.json

Step 7: Set up MLX-AUDIO (cloned local repo)

pip install -e ~/projects/voice/mlx-audio

Note: Kokoro-82M TTS model skips names and other out-of-dictionary (OOD) words due to its reliance on an external grapheme-to-phoneme (g2p) conversion tool called espeak-ng2. This behavior occurs when espeak-ng is not properly installed or detected by the system. To prevent Kokoro-82M from skipping names and OOD words, you need to install espeak-ng

echo 'export ESPEAK_DATA_PATH=/opt/homebrew/share/espeak-ng-data' >> ~/.zshrc
source ~/.zshrc

# make audio book
python make_abook_mlx.py library/pg2680.json

Step 8: Make RANDR Audiobook

python make_randr.py audio/pg20203/

Pipeline Schema

flowchart LR
    A[EPUB] --> B[epub_to_json.py]
    B --> C[JSON book]
    C --> D[make_abook.py / make_abook_mlx.py / make_abook_open_ai.py / zyphra_api.py / deepgram_api.py]
    D --> E[Audio clips]
    E --> F[play_audio.py]
    E --> G[merge_audio_clips.py]
    C --> H[make_randr.py]
    E --> H
    H --> I[RANDR zip]

Project Structure

runandread-audiobook/
├── epub_to_json.py      # Extracts text from EPUB into JSON
├── make_abook.py        # Converts text into audio files with Zonos TTS
├── make_abook_mlx.py    # Converts text into audio files using the Kokoro-82M TTS model with mlx-audio (optimized for Apple M-series processors).
├── make_randr.py        # Wrap the produced audio and JSON files into a ZIP file readable by the Run & Read app.
├── play_audio.py        # Play audio clips sequentially while displaying text
├── merge_audio_clips.py # Merges audio files into one and generates a timestamped JSON file
├── word_tokens_tools.py # Utility to normalize the text before pass it to the TTS
├── test_scan_next.py    # Unit tests to make sure text normalization works as expected
├── zyphra_api.py        # Converts text into audio files with Zyphra SDK/Rest API API
├── deepgram_api.py      # Converts text into audio files with Deepgram SDK/Rest API API
├── make_abook_open_ai.py# Converts text into audio files with OpenAI TTS
├── assets/              # Stores MP3 files for voice cloning
├── epub/                # EPUB books from the Gutenberg Project
├── audio/               # Output audio files
├── audiobooks/          # RANDR audiobooks samples
     ├── pg2680.randr    # Meditations by Emperor of Rome Marcus Aurelius
     ├── pg20203.randr   # Autobiography of Benjamin Franklin
├── library/             # Output JSON book files
├── README.md            # Documentation
├── requirements.txt     # Dependencies
└── LICENSE              # Open-source license

Contributions

Contributions are welcome! Feel free to open an issue or submit a pull request.

References & Kudos

Zonos - Open-source TTS model.
AUDIO-MLX - A TTS and STS library built on Apple's MLX framework.
Kokoro-TTS - An open-weight TTS model with 82 million parameters.
Deepgram - Commercial cloud-based TTS.
EbookLib - EPUB parsing in Python.
yt-dlp - YouTube audio downloader for voice cloning.
Gutenberg Project - A library of over 75,000 free eBooks.
Python Simplified, MariyaSha - Python Simplified. Kudos to Mariya for her beautiful voice that I did clone from one of her videos.

Contact

Sergey N - Connect and follow me on LinkedIn.

License

This project is open-source and available under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RunAndRead-Audiobook-Pipeline

Overview

Create Audiobooks with AI (RANDR format)

Features

Audio Samples

Dependencies & Technologies

Installation

1) Install Python Dependencies

2) Set Up Zyphra/Zonos

3) Set Up ffmpeg

4) Download a Voice Sample from YouTube

Usage

Step 1: Convert EPUB to JSON

Step 2: Generate TTS Audio Files

Step 3: Play Audiobook in CLI

Step 4: Merge a set of audio clips into one audio file

Step 5: Prepare audio clip for YouTube/LinkedIn

Step 6: Set up REST Zyphra/Deepgram/OpenAI SDK

Step 7: Set up MLX-AUDIO (cloned local repo)

Step 8: Make RANDR Audiobook

Pipeline Schema

Project Structure

Contributions

References & Kudos

Contact

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
assets		assets
audio		audio
audiobooks		audiobooks
epub		epub
library		library
models/large		models/large
mp32randr		mp32randr
yfabric		yfabric
.gitignore		.gitignore
LICENSE		LICENSE
RANDR.md		RANDR.md
README.md		README.md
deepgram_api.py		deepgram_api.py
epub_to_json.py		epub_to_json.py
make_abook.py		make_abook.py
make_abook_mlx.py		make_abook_mlx.py
make_abook_open_ai.py		make_abook_open_ai.py
make_randr.py		make_randr.py
merge_audio_clips.py		merge_audio_clips.py
play_audio.py		play_audio.py
requirements.txt		requirements.txt
test_scan_next.py		test_scan_next.py
word_tokens_tools.py		word_tokens_tools.py
zyphra_api.py		zyphra_api.py

License

sergenes/runandread-audiobook

Folders and files

Latest commit

History

Repository files navigation

RunAndRead-Audiobook-Pipeline

Overview

Create Audiobooks with AI (RANDR format)

Features

Audio Samples

Dependencies & Technologies

Installation

1) Install Python Dependencies

2) Set Up Zyphra/Zonos

3) Set Up ffmpeg

4) Download a Voice Sample from YouTube

Usage

Step 1: Convert EPUB to JSON

Step 2: Generate TTS Audio Files

Step 3: Play Audiobook in CLI

Step 4: Merge a set of audio clips into one audio file

Step 5: Prepare audio clip for YouTube/LinkedIn

Step 6: Set up REST Zyphra/Deepgram/OpenAI SDK

Step 7: Set up MLX-AUDIO (cloned local repo)

Step 8: Make RANDR Audiobook

Pipeline Schema

Project Structure

Contributions

References & Kudos

Contact

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages