Install Guide for Pangolin Whisper Web Interface on Linux

Pangolin is a simple web interface for OpenAI’s Whisper system. Very easy and simple to use.

https://github.com/incredig33k/pangolin/

Add User

We’ll setup a new unprivileged user called pangolin.

sudo useradd -m pangolin
sudo passwd pangolin
su pangolin
cd
pip3 install whisper-ctranslate2
  or 
pip3.9 install whisper-ctranslate2
npm install https formidable@2.1.1 fs path url
wget https://github.com/incredig33k/pangolin/releases/download/pre-release/pangolin-web.zip
unzip ./pangolin_web.zip
cd pangolin_web
mkdir uploads

Change default port to 8443. It is possible to use 443, but we would need to run privileged

sed -i "s/443/8443/g" ./pangolin_server.js

Setup SSL Certificate

This assumes you already have Let’s Encrypt setup. We’ll create a certificate directory for Pangolin to use and then copy the certs there.

mkdir /home/pangolin/ssl
sudo cp /etc/letsencrypt/live/DOMAINNAME.COM/fullchain.pem /home/pangolin/ssl/
sudo cp /etc/letsencrypt/live/DOMAINNAME.COM/privkey.pem /home/pangolin/ssl/
sudo chown pangolin:pangolin /home/pangolin/ssl/fullchain.pem
sudo chown pangolin:pangolin /home/pangolin/ssl/privkey.pem

Now back in our web directory we can update the vars.js file like the following.
Note that we do need the full file path. Can’t use ~/

module.exports = {
key: '/home/pangolin/ssl/privkey.pem',
cert: '/home/pangolin/ssl/fullchain.pem'
}

Firewall rules

We can change the port Pangolin runs on by editing the listen port at the bottom of the pangolin_server.js file.

sudo firewall-cmd --add-port=443/tcp

Setting up systemd Service

Now we need to copy our service file and enable the Pangolin service.

sudo cp /home/pangolin/pangolin_web/pangolin.service /usr/lib/systemd/system/pangolin.service
sudo systemctl enable pangolin.service

Start the service

sudo systemctl start pangolin

How To Play an Audio file in JavaScript

Here is a quick and simple way to play an audio clip in JavaScript

const audio = new Audio('path/to/audio.mp3')
audio.play()

That is literally it.

You can set “audio.play()” to where ever you need in your code so it gets triggered when needed.

https://stackoverflow.com/questions/9419263/how-to-play-audio

JavaScript – The media resource indicated by the src attribute or assigned media provider object was not suitable.

If you receive the following error,

The media resource indicated by the src attribute or assigned media provider object was not suitable.

It could be because your media file is not supported. Try converting your audio file to a different format.

https://stackoverflow.com/questions/57246199/domexception-the-media-resource-indicated-by-the-src-attribute-or-assigned-med

Improving Accuracy for OpenAI’s Whisper

We can use prompts to improve our Whisper transcriptions.

We can add “–initial_prompt” to our command like the following.

--initial_prompt "Computer Historical etc"

We can also look into suppressing Tokens to eliminate words that we won’t use. Believe we need to find the tokens for words, and then we can use the token ID to ignore those words. More links below.

https://github.com/openai/whisper/blob/15ab54826343c27cfaf44ce31e9c8fb63d0aa775/whisper/decoding.py#L87-L88

https://platform.openai.com/docs/guides/speech-to-text/prompting

https://github.com/openai/whisper/discussions/355

https://github.com/openai/whisper/discussions/117

https://huggingface.co/blog/fine-tune-whisper

https://discuss.huggingface.co/t/adding-custom-vocabularies-on-whisper/29311/2?u=nbroad

Install and Use OpanAI Whisper

These commands work for Ubuntu. Should be simple to change for other Linux distros.

Install Nvidia and CUDA drivers

sudo apt install nvidia-driver-530 nvidia-cuda-toolkit

Reboot so the system uses the driver.

Install pip and ffmpeg

sudo apt install python3-pip
sudo apt install ffmpeg

Now we can install whisper with

pip install -U openai-whisper

Run Whisper

After it is installed, it should be able to run it like

whisper audio.mp3 --model medium

Change out medium to the model you would like to use. It will then download the model and then work get to work on transcribing it. The .en models i.e. medium.en, seem to perform better then the other ones. If you are using English that is.

If you receive a “Command ‘whisper’ not found” error, you may not have ~/.local/bin in your user PATH. Either add ~/.local/bin to your PATH, or run whisper with the full path

~/.local/bin/whisper audio.mp3 --model medium

OpenAI Whisper GitHub link.
https://github.com/openai/whisper

Turn 3.5mm Jack on Raspberry Pi Running LineageOS 16

You will need an Android Terminal. You can turn on the default one in the developer settings. Need to turn on developer mode?

You will also need to enable root which can also be done in the Developer settings

Open up the terminal app and run

su
rpi3-audio-jack.sh

More info here

https://konstakang.com/devices/rpi3/LineageOS16.0/