This repository combines three standalone StepFun skills:
step-tts: text-to-speech and voice cloning via StepFun TTSstep-asr: speech-to-text via StepFun ASR streaming APIstepfun-step-audio-r1-1: non-streaming audio chat turns via StepFun Chat Completions (step-audio-r1.1)
The three skills share one repo layout, while their underlying implementations remain separate:
- TTS stays in shell:
skills/step-tts/scripts/tts.sh - ASR stays in Python:
skills/step-asr/scripts/transcribe.py - Audio chat stays in Python:
skills/stepfun-step-audio-r1-1/scripts/stepfun_audio_chat.py
skills/step-tts/SKILL.md: Agent-facing description, triggers, and usage examples for TTS / voice cloneskills/step-tts/scripts/tts.sh: Main TTS CLI entrypointskills/step-asr/SKILL.md: Agent-facing description, triggers, and usage examples for ASRskills/step-asr/scripts/transcribe.py: Main ASR CLI entrypointskills/stepfun-step-audio-r1-1/SKILL.md: Agent-facing description, triggers, and usage examples for StepFun audio chatskills/stepfun-step-audio-r1-1/scripts/stepfun_audio_chat.py: Main non-streaming StepFun audio chat CLItests/test_step_tts_cli.sh: Smoke tests for the TTS CLI help commandstests/test_step_asr_cli.sh: Smoke tests for the ASR CLI help commandstests/test_stepfun_audio_r1_1_cli.sh: Smoke tests for the audio-chat CLI
bash,curl,python3- A valid StepFun API key
- Optional for
stepfun-step-audio-r1-1local audio normalization:ffmpegor macOSafconvert
- Preferred environment variable:
STEPFUN_API_KEY - Legacy alias still accepted for compatibility:
STEP_API_KEY - The
step-ttsconfig command stores the key in~/.stepfun_api_key - All three skills read
~/.stepfun_api_key - All three skills also read the legacy file
~/.step_api_keyif present
List skills from this repo (local dev, from repo root):
npx skills add . --list --full-depthNote for OpenClaw local installs:
- OpenClaw's project-level skill directory is also named
skills/. - If you run
npx skills add ... --agent openclawinside this source repository, the installer may write into the repo's ownskills/directory and overwrite the source layout. - For OpenClaw verification, use a separate consumer project directory, or install globally.
Install just the TTS skill:
npx skills add . --full-depth --skill step-tts -yInstall just the ASR skill:
npx skills add . --full-depth --skill step-asr -yInstall just the audio-chat skill:
npx skills add . --full-depth --skill stepfun-step-audio-r1-1 -yInstall all three skills to OpenClaw from a separate consumer project:
cd /path/to/another/project
npx skills add /path/to/StepAudio-Skills --full-depth --agent openclaw -yConfigure your TTS API key (saved to ~/.stepfun_api_key):
bash skills/step-tts/scripts/tts.sh config --set-api-key YOUR_STEPFUN_API_KEYGenerate audio:
bash skills/step-tts/scripts/tts.sh speak \
-t "智能阶跃,十倍每一个人的可能" \
-o step.opusDefaults for speak:
--model:step-tts-2--voice:elegantgentle-female--response-format:opus
Clone a voice (using an existing file_id from StepFun Files API):
bash skills/step-tts/scripts/tts.sh clone-voice \
--model step-tts-mini \
--file-id file-XXXX \
--text "智能阶跃,十倍每一个人的可能" \
--sample-text "今天天气不错"The file_id must come from the official StepFun Files API:
- Upload your reference audio (5–10 seconds of the voice you want to clone,
mp3orwav) usingPOST https://api.stepfun.com/v1/files - Set
purpose="storage"in the request body - The response will contain a File object with an
idlikefile-abc123— pass this value to--file-id
Set the ASR API key as an environment variable:
export STEPFUN_API_KEY=YOUR_STEPFUN_API_KEYIf you already ran the TTS config command, step-asr can also reuse the shared key saved in ~/.stepfun_api_key.
Transcribe an audio file:
python3 skills/step-asr/scripts/transcribe.py /path/to/audio.wavSave the transcription to a file:
python3 skills/step-asr/scripts/transcribe.py /path/to/audio.mp3 --out /tmp/transcript.txtOutput as JSON:
python3 skills/step-asr/scripts/transcribe.py /path/to/audio.ogg --jsonReuse the shared StepFun API key from ~/.stepfun_api_key, or export it directly:
export STEPFUN_API_KEY=YOUR_STEPFUN_API_KEYCreate a non-streaming text-in, audio-out turn:
python3 skills/stepfun-step-audio-r1-1/scripts/stepfun_audio_chat.py \
--prompt "用中文介绍一下苏州的春天,语气自然一点。" \
--voice wenrounansheng \
--format wavSend text plus local audio input:
python3 skills/stepfun-step-audio-r1-1/scripts/stepfun_audio_chat.py \
--prompt "听完这段语音后,总结重点,并用更简洁的话复述。" \
--input-audio /path/to/input.wav \
--voice wenrounansheng \
--format wavInspect the generated payload without sending a network request:
python3 skills/stepfun-step-audio-r1-1/scripts/stepfun_audio_chat.py \
--prompt "测试 step-audio-r1.1 非流式 payload" \
--dry-run \
--print-jsonRun all CLI and unit tests from the repo root:
npm test