Setting up Databricks Dolly on Windows with GPU

The total process can take awhile to setup Dolly. You’ll need a good internet connection and around 50GB of hard drive space.

Install Nvidia CUDA Toolkit

You’ll need to install the CUDA Toolkit to take advantage of the GPU. The GPU is much faster than just using the CPU.

https://developer.nvidia.com/cuda-downloads

Install Git

Install git from the following site.

https://git-scm.com/downloads

Download Dolly

Download Dolly with git.

git lfs install 
git clone https://huggingface.co/databricks/dolly-v2-12b

Install Python

We’ll also need Python installed if it is not already.
https://www.python.org/downloads/release/

Next we’ll need the following installed

py.exe -m pip install numpy
py.exe -m pip install accelerate>=0.12.0 transformers[torch]==4.25.1
py.exe -m pip install numpy --pre torch --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu117 --user

The last one is needed to get Dolly to utilize a GPU.

Run Dolly

Run a python console. If you run it as administrator, it should be faster.

py.exe

Run the following commands to set up Dolly.

import torch
from transformers import pipeline

generate_text = pipeline(model="databricks/dolly-v2-3b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")

# Or to use the full model run

generate_text = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")

Note: if you have issues, you may want/need to specify an offload folder with offload_folder=”.\offloadfolder”. An SSD is preferable.
Also if you have lots of RAM, you can take out the “torch_dtype=torch.bfloat16”

Alternatively, if we don’t want to trust_remote_code, we can do run the following

from instruct_pipeline import InstructionTextGenerationPipeline
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-3b", padding_side="left")
model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-3b", device_map="auto")

generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)

Now can ask Dolly a question.

generate_text("Your question?")

Example:

>>> generate_text("Tell me about Databricks dolly-v2-3b?")
'Dolly is the fully managed open-source engine that allows you to rapidly build, test, and deploy machine learning models, all on your own infrastructure.'

Further information is available at the following two links.

https://github.com/databrickslabs/dolly
https://huggingface.co/databricks/dolly-v2-3b

Fix Peertube youtube-dl not Downloading

Issue was not being able to import a video into Peertube using a URL.

Peertube was set up to use youtube-dl which is in /var/www/peertube/storage/bin/youtube-dl. Further investigation showed that Peertube calls it with python.

For example

python youtube-dl video-to-download

Usually Python refers to Python 2 where as Python3 refers to Python 3.

We can create a symlink so that python = python3

sudo ln -s /usr/bin/python3 /usr/bin/python

This way when Peertube runs python, it technically will run it with python3.

Note you will probably run into issues if you do have Python 2 installed and need it. In my case, python was not installed and didn’t reference anything.

LibreNMS – Package not found: The ‘command_runner>=’

The Problem

Running the ./validate.php script returns the following error

[FAIL]  Python3 module issue found: 'Required packages: ['PyMySQL!=1.0.0', 'python-dotenv', 'redis>=3.0', 'setuptools', 'psutil>=5.6.0', 'command_runner>=1.3.0']
Package not found: The 'command_runner>=1.3.0' distribution was not found and is required by the application
'
        [FIX]:
        pip3 install -r /opt/librenms/requirements.txt

Running the [FIX] throws an error saying gcc failed with exit status 1.

The Solution

Fortunately this issue is easy to resolve.

First we need to install python3-devel

sudo yum install python3-devel

Next, as the librenms user, run the pip command to install the requirements.

pip3 install --user -U -r /opt/librenms/requirements.txt

Run ./validate.php to verify that everything is working.

librenms validate.php results

LibreNMS update to Python 3

You may get the following alert in LibreNMS. Basically you need to install python 3 to keep things up to date.

Python 3 is required to run LibreNMS as of May, 2020. You need to install Python 3 to continue to receive updates. If you do not install Python 3 and required packages, LibreNMS will continue to function but stop receiving bug fixes and updates.

Install Python 3

Install Python 3 with yum, or apt if you are on a Debian based distro.

sudo yum install python3
sudo pip3 install -r /opt/librenms/requirements.txt

Verify LibreNMS is updated and working

Run the following commands to make sure that LibreNMS is working correctly and is up to date.

cd /opt/librenms
sudo ./validate.php
sudo ./daily.sh

Convert Kismet NetXML capture to CSV

First download the following python script which we’ll use to convert the Kismet NetXML file.

https://github.com/MichaelCaraccio/NetXML-to-CSV

wget https://raw.githubusercontent.com/MichaelCaraccio/NetXML-to-CSV/master/main.py

You should now be able to run the script with.

python main.py 

Help output for NetXML to CSV

bob@localhost:~$ python main.py  
Usage: main.py <NetXML File> <oUTPUT File Name> <Filter> (Filter is optional)
bob@localhost:~$

Usage

python main.py Kismet-file-input.netxml Kismet-csv-output.csv

Example of converting file.

bob@localhost:~$ python main.py Kismet-20191023-12-50-42.netxml Kismet-20191023-12-50-42.csv

You can now import the csv into Google Earth.