Skip to content

Start work on wav2vec2 tokenizer#90

Open
j4qfrost wants to merge 3 commits intoguillaume-be:mainfrom
j4qfrost:master
Open

Start work on wav2vec2 tokenizer#90
j4qfrost wants to merge 3 commits intoguillaume-be:mainfrom
j4qfrost:master

Conversation

@j4qfrost
Copy link

I have very rudimentary knowledge about ML but hopefully this is a sufficient start. The tests are modified from the other tokenizers. The tokenizer runs through the text twice, first to look for special characters and mask them, then to go character by character to find symbols not defined by the vocabulary and mark them as unknown. Please give me input, I don't really know what I'm doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant