@@ -208,27 +208,32 @@ Duration is determined by the source audio.
208208
209209** Repaint** (` --src-audio ` + ` repainting_start ` /` repainting_end ` in JSON):
210210regenerates a time region of the source audio while preserving the rest.
211+ Requires the ** SFT model** (the turbo model is less performant for this task).
211212The DiT receives a binary mask: 1.0 inside the region (generate), 0.0 outside
212213(keep original). Source latents outside the region provide context; silence
213- fills the repaint zone. Both fields default to -1 (inactive). Set one or both
214- to activate: -1 on start means 0s, -1 on end means source duration.
215- ` audio_cover_strength ` is ignored in repaint mode (the mask handles everything).
214+ fills the repaint zone. Both fields default to -1
215+ (inactive). Set one or both to activate: -1 on start means 0s, -1 on end means
216+ source duration. ` audio_cover_strength ` is ignored in repaint mode (the mask
217+ handles everything).
216218
217219``` bash
218220cat > /tmp/repaint.json << 'EOF '
219221{
220222 "caption": "Smooth jazz guitar solo with reverb",
221223 "lyrics": "[Instrumental]",
222224 "repainting_start": 10.0,
223- "repainting_end": 25.0
225+ "repainting_end": 25.0,
226+ "inference_steps": 50,
227+ "guidance_scale": 7.0,
228+ "shift": 1.0
224229}
225230EOF
226231
227232./build/dit-vae \
228233 --src-audio song.wav \
229234 --request /tmp/repaint.json \
230235 --text-encoder models/Qwen3-Embedding-0.6B-Q8_0.gguf \
231- --dit models/acestep-v15-turbo -Q8_0.gguf \
236+ --dit models/acestep-v15-sft -Q8_0.gguf \
232237 --vae models/vae-BF16.gguf
233238```
234239
0 commit comments