If I record some audio from my mircophone into a WAV file (using arecord or Audacity) and give that to whisper-cli the results are nearly perfect, regardless of utterance length. I would like to do this in real-time like stream is supposed to do.
But whisper-stream barely picks up one word here and there. It is not that it outputs the wrong word - it is more like it can not hear me at all most of the time. Lots of [BLANK_AUDIO] messages. Perhaps the difference is in the use of the SDL package? Are there tuning parameters?