Commit ae9fbc1
committed
fix: normalize i16 to f32 in segmentation and flush trailing speech
Two bugs in get_segments:
1. i16 samples are cast to f32 via `x as f32` without dividing by
32768, feeding the ONNX segmentation model values in [-32768, 32767]
when it expects [-1.0, 1.0]. This causes the model to misclassify
all frames as non-speech for typical microphone input.
2. When speech extends to end-of-audio, the final segment is silently
dropped because there is no flush when `is_speeching` is still true
after all windows are processed.
Made-with: Cursor1 parent e23bd29 commit ae9fbc1
1 file changed
+27
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | | - | |
59 | | - | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
60 | 61 | | |
61 | 62 | | |
62 | 63 | | |
| |||
126 | 127 | | |
127 | 128 | | |
128 | 129 | | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
129 | 154 | | |
130 | 155 | | |
131 | 156 | | |
0 commit comments