How to Bypass NVIDIA NVENC Limits on RTX Cards on Linux

It appears that NVIDIA has limited the number of NVEncoding streams on consumer GPUs. Guess it is so people have to buy the more expensive professional cards.

Fortunately, the limit is only applied to the driver, and there is a patch available that let’s us bypass the limiter.

https://github.com/keylase/nvidia-patch

Install Patch

This assumes you already have the driver installed. If you do not, or run into issues with the commands below, refer to the above link.

Download the tool

https://github.com/keylase/nvidia-patch/archive/refs/heads/master.zip

wget https://github.com/keylase/nvidia-patch/archive/refs/heads/master.zip

Unzip the file

unzip nvidia-patch-master.zip

Run the patch script

cd nvidia-patch-master
sudo bash ./patch.sh

And we are finished!

Further reading

NVIDIA has a matrix of which cards support how many streams etc.

https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new

And while we are on the topic of artificial limits, check out the vGPU license bypass

https://github.com/KrutavShah/vGPU_LicenseBypass

Setting up Databricks Dolly on Ubuntu with GPU

This is a quick guide for getting Dolly running on an Ubuntu machine with Nvidia GPUs.

You’ll need a good internet connection and around 35GB of hard drive space for the Nvidia driver, Dolly (12b model) and extras. You can use the smaller models to take up less space. The 8 billion parameter model uses about ~14GB of space while the 3 billion parameter one is around 6GB

Install Nvidia Drivers and CUDA

sudo apt install nvidia-driver nvidia-cuda-toolkit

Reboot to activate the Nvidia driver

reboot

Install Python

Python should already be installed, but we do need to install pip.

Once pip is installed, then we need to install numpy, accelerate, and transformers

sudo apt install python3-pip
pip install numpy
pip install accelerate>=0.12.0 transformers[torch]==4.25.1

Run Dolly

Run a python console. If you run it as administrator, it should be faster.

python3

Run the following commands to set up Dolly.

import torch
from transformers import pipeline

generate_text = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")

# Alternatively, If you want to use a smaller model run

generate_text = pipeline(model="databricks/dolly-v2-3b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")

Notes:

  1. If you have issues, you may want/need to specify an offload folder with offload_folder=”.\offloadfolder”. An SSD is preferable.
  2. If you have lots of RAM, you can take out the “torch_dtype=torch.bfloat16”
  3. If you do NOT have lots of ram (>32GB), then you may only be able to run the smallest model

Alternatively, if we don’t want to trust_remote_code, we can download this file, and run the following

from instruct_pipeline import InstructionTextGenerationPipeline
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-12b", padding_side="left")
model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-12b", device_map="auto")

generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)

Now we can ask Dolly a question.

generate_text("Your question?")

Example:

>>> generate_text("Tell me about Databricks dolly-v2-3b?")
'Dolly is the fully managed open-source engine that allows you to rapidly build, test, and deploy machine learning models, all on your own infrastructure.'

Further information is available at the following two links.

https://github.com/databrickslabs/dolly
https://huggingface.co/databricks/dolly-v2-3b
https://huggingface.co/databricks/dolly-v2-12b

Setting up Databricks Dolly on Windows with GPU

The total process can take awhile to setup Dolly. You’ll need a good internet connection and around 50GB of hard drive space.

Install Nvidia CUDA Toolkit

You’ll need to install the CUDA Toolkit to take advantage of the GPU. The GPU is much faster than just using the CPU.

https://developer.nvidia.com/cuda-downloads

Install Git

Install git from the following site.

https://git-scm.com/downloads

Download Dolly

Download Dolly with git.

git lfs install 
git clone https://huggingface.co/databricks/dolly-v2-12b

Install Python

We’ll also need Python installed if it is not already.
https://www.python.org/downloads/release/

Next we’ll need the following installed

py.exe -m pip install numpy
py.exe -m pip install accelerate>=0.12.0 transformers[torch]==4.25.1
py.exe -m pip install numpy --pre torch --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu117 --user

The last one is needed to get Dolly to utilize a GPU.

Run Dolly

Run a python console. If you run it as administrator, it should be faster.

py.exe

Run the following commands to set up Dolly.

import torch
from transformers import pipeline

generate_text = pipeline(model="databricks/dolly-v2-3b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")

# Or to use the full model run

generate_text = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")

Note: if you have issues, you may want/need to specify an offload folder with offload_folder=”.\offloadfolder”. An SSD is preferable.
Also if you have lots of RAM, you can take out the “torch_dtype=torch.bfloat16”

Alternatively, if we don’t want to trust_remote_code, we can do run the following

from instruct_pipeline import InstructionTextGenerationPipeline
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-3b", padding_side="left")
model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-3b", device_map="auto")

generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)

Now can ask Dolly a question.

generate_text("Your question?")

Example:

>>> generate_text("Tell me about Databricks dolly-v2-3b?")
'Dolly is the fully managed open-source engine that allows you to rapidly build, test, and deploy machine learning models, all on your own infrastructure.'

Further information is available at the following two links.

https://github.com/databrickslabs/dolly
https://huggingface.co/databricks/dolly-v2-3b

Install NVIDIA 510 driver (LHR Bypass) on CentOS/Fedora for Mining

Download NVIDIA 510 Driver

Download driver from here (Official Link) or with wget.

wget https://us.download.nvidia.com/XFree86/Linux-x86_64/510.68.02/NVIDIA-Linux-x86_64-510.68.02.run

If the above link doesn’t work, you can download a copy from this site with the following.

wget https://www.incredigeek.com/home/downloads/NVIDIA/NVIDIA-Linux-x86_64-510.68.02.run.tgz

Extract with

tar zxf ./NVIDIA-Linux-x86_64-510.68.02.run.tgz

Verify Driver (Optional)

Not a bad idea to check if you downloaded from an untrusted source.

sha256sum NVIDIA-Linux-x86_64-510.68.02.run

The Hash should equal

bd2c344ac92b2fc12b06043590a4fe8d4eb0ccb74d0c49352f004cf2d299f4c5

Install NVIDIA Driver

We can now install the NVIDIA driver with the following command.

sudo ./NVIDIA-Linux-x86_64-510.68.02.run

It will have a couple of prompts that are easy to walk through.

While installing the driver, it can try to blacklist Nouveau, if it runs into issues, try running the following, reboot, and run the install again.

sudo gruby --update-kernel=ALL --args="nouveau.modeset=0"

After driver is installed, reboot your machine.

Now download a copy of your favorite mining software and enjoy the extra Mhs…

Set coolbits value on Fedora Linux

You sometimes need to set the coolbits value to overclock your GPU on Linux

You’ll need to install nvidia-xconfig

sudo dnf install nvidia-xconfig

Then you can set the cool bits value with the following command. Change 24 to the appropriate cool bits value. Refer to the link below.

sudo nvidia-xconfig --cool-bits=28

It’ll create a new xorg config file. Reboot to take advantage of cool bits being enabled.

Multiple GPUs

If you have multiple GPUs, you will need to do the following to enable cool bits on each GPU.

sudo nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration

https://wiki.archlinux.org/title/NVIDIA/Tips_and_tricks#Enabling_overclocking

GTX 1070 – ETH Hashrate and Power Consumption

Nvidia GTX 1070 ETH Hashrate and Power Consumption

The following is the estimated Hashrate and power consumption of a GTX 1070 based on web sources.

GTX 1070

Cost

Cost Used (eBay)
February 2021: $300 – $500

Hasrate for Ethash and Power Consumption

Hashrate: 26/Mhs

Power Consumption
Average: 130 W


Average Mhs Per Watt : 0.2/Mhs
Average Watts Per Mhs: 5 W

Profitability is about $2.97/day as of February 9, 2021

https://whattomine.com/coins/151-eth-ethash?hr=26&p=130&fee=0.0&cost=0.1&hcost=0.0&commit=Calculate

GTX 3090 – ETH Hashrate and Power Consumption

Nvidia GTX 1070 ETH Hashrate and Power Consumption

The following is the estimated Hashrate and power consumption of a GTX 1070 based on web sources.

GTX 1070

Cost

Cost Used (eBay)
February 2021: $300 – $500

Hasrate for Ethash and Power Consumption

Hashrate: 26/Mhs

Power Consumption
Average: 130 W


Average Mhs Per Watt : 0.2/Mhs
Average Watts Per Mhs: 5 W

Profitability is about $2.97/day as of February 9, 2021

https://whattomine.com/coins/151-eth-ethash?hr=26&p=130&fee=0.0&cost=0.1&hcost=0.0&commit=Calculate

Nvidia RTX 3080 – Hashrate and Power Consumption

Nvidia RTX 3080 ETH Hashrate and Power Consumption

The following is the estimated Hashrate and power consumption of a RTX 3080 based on web sources.

RTX 3080

Cost

Cost Used (eBay)
February 2021: $1600 – $2000

Hasrate for Ethash and Power Consumption

Hashrate: 97/Mhs

Power Consumption
Average: 250 W


Average Mhs Per Watt : 0.39/Mhs
Average Watts Per Mhs: 2.58 W

Profitability is about $12.45/day as of February 9, 2021

https://whattomine.com/coins/151-eth-ethash?hr=97.0&p=250.0&fee=0.0&cost=0.1&hcost=0.0&commit=Calculate

Radeon RX 570 – Hashrate and Power Consumption

These are the average RX 570 hashrates I have been getting while mining Ethereum.

https://www.msi.com/Graphics-Card/Radeon-RX-570-ARMOR-8G-OC

RX 570 brand MSI Armour

Cost

Cost Used (eBay)
February 2021: $225 – $330

Hasrate and Power Consumption

Ethereum Ethash
Hashrate: 28.5/Mhs

Power Consumption
Range: 111 – 119 W
Average = ~115 W

Settings
Core: 1100
V: 800
Mem : 2000


Average Mhs Per Watt : 0.25/Mhs
Average Watts Per Mhs: 4.04 W

Profitability is about $3.36/day as of February 9, 2021

https://whattomine.com/coins/151-eth-ethash?hr=28.0&p=115.0&fee=0.0&cost=0.1&hcost=0.0&commit=Calculate