Improving Accuracy for OpenAI’s Whisper

We can use prompts to improve our Whisper transcriptions.

We can add “–initial_prompt” to our command like the following.

--initial_prompt "Computer Historical etc"

We can also look into suppressing Tokens to eliminate words that we won’t use. Believe we need to find the tokens for words, and then we can use the token ID to ignore those words. More links below.

https://github.com/openai/whisper/blob/15ab54826343c27cfaf44ce31e9c8fb63d0aa775/whisper/decoding.py#L87-L88

https://platform.openai.com/docs/guides/speech-to-text/prompting

https://github.com/openai/whisper/discussions/355

https://github.com/openai/whisper/discussions/117

https://huggingface.co/blog/fine-tune-whisper

https://discuss.huggingface.co/t/adding-custom-vocabularies-on-whisper/29311/2?u=nbroad

Using FasterWhisper on Ubuntu

faster-whisper is a faster implementation of OpenAI’s Whisper.

https://github.com/guillaumekln/faster-whisper

Someone else has added a “front end” to it so we can just about use it as a drop in replacement for Whisper.

https://github.com/jordimas/whisper-ctranslate2

We can easily install it with pip.

pip install -U faster-Whisper
pip install -U whisper-ctranslate2

For some reason initially the quality was worse then vanilla Whisper. Adding the “–compute_type float32” option improved the quality to where there was not any difference between them.