Install Guide for Pangolin Whisper Web Interface on Linux

Pangolin is a simple web interface for OpenAI’s Whisper system. Very easy and simple to use.

https://github.com/incredig33k/pangolin/

Add User

We’ll setup a new unprivileged user called pangolin.

sudo useradd -m pangolin
sudo passwd pangolin
su pangolin
cd
pip3 install whisper-ctranslate2
  or 
pip3.9 install whisper-ctranslate2
npm install https formidable@2.1.1 fs path url
wget https://github.com/incredig33k/pangolin/releases/download/pre-release/pangolin-web.zip
unzip ./pangolin_web.zip
cd pangolin_web
mkdir uploads

Change default port to 8443. It is possible to use 443, but we would need to run privileged

sed -i "s/443/8443/g" ./pangolin_server.js

Setup SSL Certificate

This assumes you already have Let’s Encrypt setup. We’ll create a certificate directory for Pangolin to use and then copy the certs there.

mkdir /home/pangolin/ssl
sudo cp /etc/letsencrypt/live/DOMAINNAME.COM/fullchain.pem /home/pangolin/ssl/
sudo cp /etc/letsencrypt/live/DOMAINNAME.COM/privkey.pem /home/pangolin/ssl/
sudo chown pangolin:pangolin /home/pangolin/ssl/fullchain.pem
sudo chown pangolin:pangolin /home/pangolin/ssl/privkey.pem

Now back in our web directory we can update the vars.js file like the following.
Note that we do need the full file path. Can’t use ~/

module.exports = {
key: '/home/pangolin/ssl/privkey.pem',
cert: '/home/pangolin/ssl/fullchain.pem'
}

Firewall rules

We can change the port Pangolin runs on by editing the listen port at the bottom of the pangolin_server.js file.

sudo firewall-cmd --add-port=443/tcp

Setting up systemd Service

Now we need to copy our service file and enable the Pangolin service.

sudo cp /home/pangolin/pangolin_web/pangolin.service /usr/lib/systemd/system/pangolin.service
sudo systemctl enable pangolin.service

Start the service

sudo systemctl start pangolin

A Very Basic Simple Whisper Web Interface

Created a little web interface to use Whisper, technically using whisper-ctranslate2 which is built on faster-whisper.

This is not currently ready to be run on the public web. It doesn’t have any sort of TLS for encrypting communications from client to server and all the files are stored on server. Only use in a trusted environment.

Setting up Prerequisite

Installing whisper-ctranslate2

pip install -U whisper-ctranslate2

Install NodeJS

sudo apt install nodejs

or

sudo dnf install nodejs

Install Node Dependencies

npm install formidable
npm install http
npm install fs

Setting up Simple Whisper Web Interface

First we need a web directory to use.

Next lets make an uploads folder

mkdir uploads

Now let’s create a main.js file. Node is going to be our webserver. Copy the following contents.

var http = require('http')
var formidable = require('formidable')
var fs = require('fs')

const execSync = require('child_process').execSync

let newpath = ''
let modelSize = 'medium.en'
const { exec } = require('node:child_process')
const validModels = [
  'medium.en',
  'tiny',
  'tiny.en',
  'base',
  'base.en',
  'small',
  'small.en',
  'medium',
  'medium.en',
  'large-v1',
  'large-v2'
]
fs.readFile('./index.html', function (err, html) {
  if (err) throw err

  http
    .createServer(function (req, res) {
      if (req.url == '/fileupload') {
        res.write(html)
        var form = new formidable.IncomingForm()
        form.parse(req, function (err, fields, files) {
          console.log('Fields ' + fields.modeltousema)
          console.log('File ' + files.filetoupload)
          var oldpath = files.filetoupload.filepath
          newpath = './uploads/' + files.filetoupload.originalFilename
          modelSize = validModels.includes(fields.modeltouse)
            ? fields.modeltouse
            : 'medium.en'
          console.log('modelSize::' + modelSize)
          fs.rename(oldpath, newpath, function (err) {
            if (err) {
              console.log('No file selected!') // throw err
              res.write(`<div class="results">No file selected</div>`)
            } else {
              console.log(newpath)
              const output = execSync(
                `whisper-ctranslate2 ${newpath} --model ${modelSize}`,
                { encoding: 'utf-8' }
              )

              res.write(
                `<div class="results"><h2>Results:</h2> <p>${output}</p></div>`
              )
              res.end()
            }
          })
        })
      } else {
        res.writeHead(200, { 'Content-Type': 'text/html' })
        res.write(html)
        return res.end()
      }
    })
    .listen(8080)
})

Now create an index.html file and paste the following in

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Voice Transcribing Using Whisper</title>
    <link type="text/css" rel="stylesheet" href="style.css" />
  </head>
  <style>
    body {
      background-color: #b9dbe7;
      align-items: center;
    }

    .box {
      border-radius: 25px;
      padding: 25px;
      width: 80%;
      background-color: azure;
      margin: auto;
      border-bottom: 25px;
      margin-bottom: 25px;
    }

    .button {
      border-radius: 25px;
      margin: auto;
      width: 50%;
      height: 50px;
      display: flex;
      justify-content: center;
      border-style: solid;

      background-color: #e8d2ba;
    }

    h1 {
      text-align: center;
      padding: 0%;
      margin: 0%;
    }

    p {
      font-size: larger;
    }
    .headings {
      font-size: large;
      font-weight: bold;
    }
    input {
      font-size: medium;
    }
    select {
      font-size: medium;
    }
    .results {
      white-space: pre-wrap;
      border-radius: 25px;
      padding: 25px;
      width: 80%;
      align-self: center;
      background-color: azure;
      margin: auto;
    }
    .note {
      font-style: italic;
      font-size: small;
      font-weight: normal;
    }
  </style>
  <body>
    <script></script>
    <div class="box">
      <h1>Simple Whisper Web Interface</h1>
      <br />
      <p>
        Welcome to the very Simple Whisper Web Interface!<br /><br />
        This is a very basic, easy to use, web interface for OpenAI's Whisper
        tool. It has not been extensively tested, so you may encounter bugs or
        other problems.
        <br /><br />
        Instructions for use. <br />1. Select audio file <br />2. Select the
        Model you want to use <br />
        3. Click Transcribe! <br />4. Copy your transcription
      </p>
      <br />
      <br />
      <div class="headings">
        <form action="fileupload" method="post" enctype="multipart/form-data">
          Audio File: <input type="file" name="filetoupload" /><br />

          <br />
          Model:
          <select name="modeltouse" id="modeltouse">
            <option value="medium.en">medium.en</option>
            <option value="tiny">tiny</option>
            <option value="tiny.en">tiny.en</option>
            <option value="base">base</option>
            <option value="base.en">base.en</option>
            <option value="small">small</option>
            <option value="small.en">small.en</option>
            <option value="medium">medium</option>
            <option value="medium.en">medium.en</option>
            <option value="large-v1">large-v1</option>
            <option value="large-v2">large-v2</option>
          </select>
          <p class="note">
            Large-v2 and medium.en seem to produce the most accurate results.
          </p>
          <br />
          <br />
          <br />
          <input class="button" type="submit" value="Transcribe!" />
        </form>
      </div>
    </div>
  </body>
</html>

Now we should be set to go.

Fire the web server up with

node ./main.js

If we want to start it in the background, run

node ./main.js &

Known Limitations or Bugs

If you hit Transcribe with no file selected, the server crashes.

We are calling whisper-ctranslate2 directly, if it is not in the path, then it won’t work.

We are currently using the medium.en model, if the model is not downloaded, then the first transcription may take awhile while it downloads. Would like to add a menu for selecting which model to use. We fixed this by adding a drop down that let’s you select a model.

Would be nice to have an option for getting rid of the timestamps.

Improving Accuracy for OpenAI’s Whisper

We can use prompts to improve our Whisper transcriptions.

We can add “–initial_prompt” to our command like the following.

--initial_prompt "Computer Historical etc"

We can also look into suppressing Tokens to eliminate words that we won’t use. Believe we need to find the tokens for words, and then we can use the token ID to ignore those words. More links below.

https://github.com/openai/whisper/blob/15ab54826343c27cfaf44ce31e9c8fb63d0aa775/whisper/decoding.py#L87-L88

https://platform.openai.com/docs/guides/speech-to-text/prompting

https://github.com/openai/whisper/discussions/355

https://github.com/openai/whisper/discussions/117

https://huggingface.co/blog/fine-tune-whisper

https://discuss.huggingface.co/t/adding-custom-vocabularies-on-whisper/29311/2?u=nbroad

Using FasterWhisper on Ubuntu

faster-whisper is a faster implementation of OpenAI’s Whisper.

https://github.com/guillaumekln/faster-whisper

Someone else has added a “front end” to it so we can just about use it as a drop in replacement for Whisper.

https://github.com/jordimas/whisper-ctranslate2

We can easily install it with pip.

pip install -U faster-Whisper
pip install -U whisper-ctranslate2

For some reason initially the quality was worse then vanilla Whisper. Adding the “–compute_type float32” option improved the quality to where there was not any difference between them.

Install and Use OpanAI Whisper

These commands work for Ubuntu. Should be simple to change for other Linux distros.

Install Nvidia and CUDA drivers

sudo apt install nvidia-driver-530 nvidia-cuda-toolkit

Reboot so the system uses the driver.

Install pip and ffmpeg

sudo apt install python3-pip
sudo apt install ffmpeg

Now we can install whisper with

pip install -U openai-whisper

Run Whisper

After it is installed, it should be able to run it like

whisper audio.mp3 --model medium

Change out medium to the model you would like to use. It will then download the model and then work get to work on transcribing it. The .en models i.e. medium.en, seem to perform better then the other ones. If you are using English that is.

If you receive a “Command ‘whisper’ not found” error, you may not have ~/.local/bin in your user PATH. Either add ~/.local/bin to your PATH, or run whisper with the full path

~/.local/bin/whisper audio.mp3 --model medium

OpenAI Whisper GitHub link.
https://github.com/openai/whisper