{"id":5064,"date":"2023-05-06T19:57:00","date_gmt":"2023-05-07T00:57:00","guid":{"rendered":"https:\/\/www.incredigeek.com\/home\/?p=5064"},"modified":"2023-05-08T23:55:36","modified_gmt":"2023-05-09T04:55:36","slug":"a-very-basic-simple-whisper-web-interface","status":"publish","type":"post","link":"https:\/\/www.incredigeek.com\/home\/a-very-basic-simple-whisper-web-interface\/","title":{"rendered":"A Very Basic Simple Whisper Web Interface"},"content":{"rendered":"\n<p>Created a little web interface to use Whisper, technically using whisper-ctranslate2 which is built on faster-whisper.  <\/p>\n\n\n\n<p>This is not currently ready to be run on the public web.  It doesn&#8217;t have any sort of TLS for encrypting communications from client to server and all the files are stored on server.  Only use in a trusted environment.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.incredigeek.com\/home\/wp-content\/uploads\/2023\/05\/image-3.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"767\" src=\"https:\/\/www.incredigeek.com\/home\/wp-content\/uploads\/2023\/05\/image-3-1024x767.png\" alt=\"\" class=\"wp-image-5075\" srcset=\"https:\/\/www.incredigeek.com\/home\/wp-content\/uploads\/2023\/05\/image-3-1024x767.png 1024w, https:\/\/www.incredigeek.com\/home\/wp-content\/uploads\/2023\/05\/image-3-300x225.png 300w, https:\/\/www.incredigeek.com\/home\/wp-content\/uploads\/2023\/05\/image-3-768x575.png 768w, https:\/\/www.incredigeek.com\/home\/wp-content\/uploads\/2023\/05\/image-3-401x300.png 401w, https:\/\/www.incredigeek.com\/home\/wp-content\/uploads\/2023\/05\/image-3.png 1094w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Setting up Prerequisite<\/h2>\n\n\n\n<p><strong>Installing whisper-ctranslate2<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">pip install -U whisper-ctranslate2<\/pre>\n\n\n\n<p><strong>Install NodeJS<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">sudo apt install nodejs<\/pre>\n\n\n\n<p>or <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">sudo dnf install nodejs<\/pre>\n\n\n\n<p><strong>Install Node Dependencies<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">npm install formidable<br>npm install http<br>npm install fs<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Setting up Simple Whisper Web Interface<\/h2>\n\n\n\n<p>First we need a web directory to use.<\/p>\n\n\n\n<p>Next lets make an uploads folder<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">mkdir uploads<\/pre>\n\n\n\n<p>Now let&#8217;s create a main.js file.  Node is going to be our webserver.  Copy the following contents.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">var http = require('http')\nvar formidable = require('formidable')\nvar fs = require('fs')\n\nconst execSync = require('child_process').execSync\n\nlet newpath = ''\nlet modelSize = 'medium.en'\nconst { exec } = require('node:child_process')\nconst validModels = [\n  'medium.en',\n  'tiny',\n  'tiny.en',\n  'base',\n  'base.en',\n  'small',\n  'small.en',\n  'medium',\n  'medium.en',\n  'large-v1',\n  'large-v2'\n]\nfs.readFile('.\/index.html', function (err, html) {\n  if (err) throw err\n\n  http\n    .createServer(function (req, res) {\n      if (req.url == '\/fileupload') {\n        res.write(html)\n        var form = new formidable.IncomingForm()\n        form.parse(req, function (err, fields, files) {\n          console.log('Fields ' + fields.modeltousema)\n          console.log('File ' + files.filetoupload)\n          var oldpath = files.filetoupload.filepath\n          newpath = '.\/uploads\/' + files.filetoupload.originalFilename\n          modelSize = validModels.includes(fields.modeltouse)\n            ? fields.modeltouse\n            : 'medium.en'\n          console.log('modelSize::' + modelSize)\n          fs.rename(oldpath, newpath, function (err) {\n            if (err) {\n              console.log('No file selected!') \/\/ throw err\n              res.write(`&lt;div class=\"results\">No file selected&lt;\/div>`)\n            } else {\n              console.log(newpath)\n              const output = execSync(\n                `whisper-ctranslate2 ${newpath} --model ${modelSize}`,\n                { encoding: 'utf-8' }\n              )\n\n              res.write(\n                `&lt;div class=\"results\">&lt;h2>Results:&lt;\/h2> &lt;p>${output}&lt;\/p>&lt;\/div>`\n              )\n              res.end()\n            }\n          })\n        })\n      } else {\n        res.writeHead(200, { 'Content-Type': 'text\/html' })\n        res.write(html)\n        return res.end()\n      }\n    })\n    .listen(8080)\n})<\/pre>\n\n\n\n<p>Now create an index.html file and paste the following in<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">&lt;!DOCTYPE html>\n&lt;html lang=\"en\">\n  &lt;head>\n    &lt;meta charset=\"UTF-8\" \/>\n    &lt;meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\" \/>\n    &lt;meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" \/>\n    &lt;title>Voice Transcribing Using Whisper&lt;\/title>\n    &lt;link type=\"text\/css\" rel=\"stylesheet\" href=\"style.css\" \/>\n  &lt;\/head>\n  &lt;style>\n    body {\n      background-color: #b9dbe7;\n      align-items: center;\n    }\n\n    .box {\n      border-radius: 25px;\n      padding: 25px;\n      width: 80%;\n      background-color: azure;\n      margin: auto;\n      border-bottom: 25px;\n      margin-bottom: 25px;\n    }\n\n    .button {\n      border-radius: 25px;\n      margin: auto;\n      width: 50%;\n      height: 50px;\n      display: flex;\n      justify-content: center;\n      border-style: solid;\n\n      background-color: #e8d2ba;\n    }\n\n    h1 {\n      text-align: center;\n      padding: 0%;\n      margin: 0%;\n    }\n\n    p {\n      font-size: larger;\n    }\n    .headings {\n      font-size: large;\n      font-weight: bold;\n    }\n    input {\n      font-size: medium;\n    }\n    select {\n      font-size: medium;\n    }\n    .results {\n      white-space: pre-wrap;\n      border-radius: 25px;\n      padding: 25px;\n      width: 80%;\n      align-self: center;\n      background-color: azure;\n      margin: auto;\n    }\n    .note {\n      font-style: italic;\n      font-size: small;\n      font-weight: normal;\n    }\n  &lt;\/style>\n  &lt;body>\n    &lt;script>&lt;\/script>\n    &lt;div class=\"box\">\n      &lt;h1>Simple Whisper Web Interface&lt;\/h1>\n      &lt;br \/>\n      &lt;p>\n        Welcome to the very Simple Whisper Web Interface!&lt;br \/>&lt;br \/>\n        This is a very basic, easy to use, web interface for OpenAI's Whisper\n        tool. It has not been extensively tested, so you may encounter bugs or\n        other problems.\n        &lt;br \/>&lt;br \/>\n        Instructions for use. &lt;br \/>1. Select audio file &lt;br \/>2. Select the\n        Model you want to use &lt;br \/>\n        3. Click Transcribe! &lt;br \/>4. Copy your transcription\n      &lt;\/p>\n      &lt;br \/>\n      &lt;br \/>\n      &lt;div class=\"headings\">\n        &lt;form action=\"fileupload\" method=\"post\" enctype=\"multipart\/form-data\">\n          Audio File: &lt;input type=\"file\" name=\"filetoupload\" \/>&lt;br \/>\n\n          &lt;br \/>\n          Model:\n          &lt;select name=\"modeltouse\" id=\"modeltouse\">\n            &lt;option value=\"medium.en\">medium.en&lt;\/option>\n            &lt;option value=\"tiny\">tiny&lt;\/option>\n            &lt;option value=\"tiny.en\">tiny.en&lt;\/option>\n            &lt;option value=\"base\">base&lt;\/option>\n            &lt;option value=\"base.en\">base.en&lt;\/option>\n            &lt;option value=\"small\">small&lt;\/option>\n            &lt;option value=\"small.en\">small.en&lt;\/option>\n            &lt;option value=\"medium\">medium&lt;\/option>\n            &lt;option value=\"medium.en\">medium.en&lt;\/option>\n            &lt;option value=\"large-v1\">large-v1&lt;\/option>\n            &lt;option value=\"large-v2\">large-v2&lt;\/option>\n          &lt;\/select>\n          &lt;p class=\"note\">\n            Large-v2 and medium.en seem to produce the most accurate results.\n          &lt;\/p>\n          &lt;br \/>\n          &lt;br \/>\n          &lt;br \/>\n          &lt;input class=\"button\" type=\"submit\" value=\"Transcribe!\" \/>\n        &lt;\/form>\n      &lt;\/div>\n    &lt;\/div>\n  &lt;\/body>\n&lt;\/html>\n<\/pre>\n\n\n\n<p>Now we should be set to go.<\/p>\n\n\n\n<p>Fire the web server up with<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">node .\/main.js<\/pre>\n\n\n\n<p>If we want to start it in the background, run<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">node .\/main.js &amp;<\/pre>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Known Limitations or Bugs<\/h2>\n\n\n\n<p>If you hit Transcribe with no file selected, the server crashes.<\/p>\n\n\n\n<p>We are calling whisper-ctranslate2 directly, if it is not in the path, then it won&#8217;t work.<\/p>\n\n\n\n<p><s>We are currently using the medium.en model, if the model is not downloaded, then the first transcription may take awhile while it downloads.  Would like to add a menu for selecting which model to use.<\/s> We fixed this by adding a drop down that let&#8217;s you select a model.  <\/p>\n\n\n\n<p>Would be nice to have an option for getting rid of the timestamps.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Created a little web interface to use Whisper, technically using whisper-ctranslate2 which is built on faster-whisper. This is not currently ready to be run on the public web. It doesn&#8217;t have any sort of TLS for encrypting communications from client &hellip; <a href=\"https:\/\/www.incredigeek.com\/home\/a-very-basic-simple-whisper-web-interface\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1458],"tags":[899,1489,1471,1436,1466,1487,1416,756,1467,1488],"class_list":["post-5064","post","type-post","status-publish","format-standard","hentry","category-ai","tag-app","tag-faster-whisper","tag-node","tag-nodejs","tag-openai","tag-simple-whisper-web-interface","tag-web","tag-website","tag-whisper","tag-whisper-ctranslate2"],"_links":{"self":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts\/5064","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/comments?post=5064"}],"version-history":[{"count":7,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts\/5064\/revisions"}],"predecessor-version":[{"id":5078,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts\/5064\/revisions\/5078"}],"wp:attachment":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/media?parent=5064"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/categories?post=5064"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/tags?post=5064"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}