Skip to content

Tokenizer and punctuation fixes, better remote config handling#350

Merged
ZachNagengast merged 5 commits intomainfrom
tokenizer-ci-and-punctuation-fixes
Sep 17, 2025
Merged

Tokenizer and punctuation fixes, better remote config handling#350
ZachNagengast merged 5 commits intomainfrom
tokenizer-ci-and-punctuation-fixes

Conversation

@ZachNagengast
Copy link
Contributor

This PR includes the following fixes and improvements:

  • CI now uses pinned os versions for more devices as well as a test plan to retry flakey tests
  • Default device support config has been update to match latest from remote
  • Remote config and model endpoints now accept remoteConfigName as well as endpoint parameters
  • Tokenizer loading improvements:
    • The tokenizer loader will now search in more locations locally before attempting to download from huggingface, including the model path.
    • There was a bug in the previous release which downloaded the tokenizer to the model path only, this fix will search for that location for backwards compatibility, but respect the tokenizerFolder if provided in the WhisperKitConfig, and downloadBase by default for downloading if needed.
  • Fixed a bug in prepended punctuation merging if there was no initial value in cases where the input alignment words didn't start with a special token.

@ZachNagengast ZachNagengast merged commit 0406fe7 into main Sep 17, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments