{"id":5060,"date":"2023-05-01T23:09:00","date_gmt":"2023-05-02T04:09:00","guid":{"rendered":"https:\/\/www.incredigeek.com\/home\/?p=5060"},"modified":"2023-05-01T23:08:26","modified_gmt":"2023-05-02T04:08:26","slug":"improving-accuracy-for-openais-whisper","status":"publish","type":"post","link":"https:\/\/www.incredigeek.com\/home\/improving-accuracy-for-openais-whisper\/","title":{"rendered":"Improving Accuracy for OpenAI&#8217;s Whisper"},"content":{"rendered":"\n<p>We can use prompts to improve our Whisper transcriptions.  <\/p>\n\n\n\n<p>We can add &#8220;&#8211;initial_prompt&#8221; to our command like the following.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">--initial_prompt \"Computer Historical etc\"<\/pre>\n\n\n\n<p>We can also look into suppressing Tokens to eliminate words that we won&#8217;t use.  Believe we need to find the tokens for words, and then we can use the token ID to ignore those words.  More links below.<\/p>\n\n\n\n<p><a href=\"https:\/\/github.com\/openai\/whisper\/blob\/15ab54826343c27cfaf44ce31e9c8fb63d0aa775\/whisper\/decoding.py#L87-L88\">https:\/\/github.com\/openai\/whisper\/blob\/15ab54826343c27cfaf44ce31e9c8fb63d0aa775\/whisper\/decoding.py#L87-L88<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/platform.openai.com\/docs\/guides\/speech-to-text\/prompting\">https:\/\/platform.openai.com\/docs\/guides\/speech-to-text\/prompting<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/github.com\/openai\/whisper\/discussions\/355\">https:\/\/github.com\/openai\/whisper\/discussions\/355<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/github.com\/openai\/whisper\/discussions\/117\">https:\/\/github.com\/openai\/whisper\/discussions\/117<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/huggingface.co\/blog\/fine-tune-whisper\">https:\/\/huggingface.co\/blog\/fine-tune-whisper<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/discuss.huggingface.co\/t\/adding-custom-vocabularies-on-whisper\/29311\/2?u=nbroad\">https:\/\/discuss.huggingface.co\/t\/adding-custom-vocabularies-on-whisper\/29311\/2?u=nbroad<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>We can use prompts to improve our Whisper transcriptions. We can add &#8220;&#8211;initial_prompt&#8221; to our command like the following. &#8211;initial_prompt &#8220;Computer Historical etc&#8221; We can also look into suppressing Tokens to eliminate words that we won&#8217;t use. Believe we need &hellip; <a href=\"https:\/\/www.incredigeek.com\/home\/improving-accuracy-for-openais-whisper\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1458],"tags":[1454,1023,1478,1466,1486,1467],"class_list":["post-5060","post","type-post","status-publish","format-standard","hentry","category-ai","tag-ai","tag-audio","tag-fasterwhisper","tag-openai","tag-transcription","tag-whisper"],"_links":{"self":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts\/5060","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/comments?post=5060"}],"version-history":[{"count":3,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts\/5060\/revisions"}],"predecessor-version":[{"id":5063,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts\/5060\/revisions\/5063"}],"wp:attachment":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/media?parent=5060"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/categories?post=5060"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/tags?post=5060"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}