Skip to content

Commit 05829ec

Browse files
doc: repaint requires SFT model
1 parent 4090263 commit 05829ec

1 file changed

Lines changed: 10 additions & 5 deletions

File tree

README.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -208,27 +208,32 @@ Duration is determined by the source audio.
208208

209209
**Repaint** (`--src-audio` + `repainting_start`/`repainting_end` in JSON):
210210
regenerates a time region of the source audio while preserving the rest.
211+
Requires the **SFT model** (the turbo model is less performant for this task).
211212
The DiT receives a binary mask: 1.0 inside the region (generate), 0.0 outside
212213
(keep original). Source latents outside the region provide context; silence
213-
fills the repaint zone. Both fields default to -1 (inactive). Set one or both
214-
to activate: -1 on start means 0s, -1 on end means source duration.
215-
`audio_cover_strength` is ignored in repaint mode (the mask handles everything).
214+
fills the repaint zone. Both fields default to -1
215+
(inactive). Set one or both to activate: -1 on start means 0s, -1 on end means
216+
source duration. `audio_cover_strength` is ignored in repaint mode (the mask
217+
handles everything).
216218

217219
```bash
218220
cat > /tmp/repaint.json << 'EOF'
219221
{
220222
"caption": "Smooth jazz guitar solo with reverb",
221223
"lyrics": "[Instrumental]",
222224
"repainting_start": 10.0,
223-
"repainting_end": 25.0
225+
"repainting_end": 25.0,
226+
"inference_steps": 50,
227+
"guidance_scale": 7.0,
228+
"shift": 1.0
224229
}
225230
EOF
226231

227232
./build/dit-vae \
228233
--src-audio song.wav \
229234
--request /tmp/repaint.json \
230235
--text-encoder models/Qwen3-Embedding-0.6B-Q8_0.gguf \
231-
--dit models/acestep-v15-turbo-Q8_0.gguf \
236+
--dit models/acestep-v15-sft-Q8_0.gguf \
232237
--vae models/vae-BF16.gguf
233238
```
234239

0 commit comments

Comments
 (0)