Setting up Databricks Dolly on Ubuntu with GPU

This is a quick guide for getting Dolly running on an Ubuntu machine with Nvidia GPUs.

You’ll need a good internet connection and around 35GB of hard drive space for the Nvidia driver, Dolly (12b model) and extras. You can use the smaller models to take up less space. The 8 billion parameter model uses about ~14GB of space while the 3 billion parameter one is around 6GB

Install Nvidia Drivers and CUDA

sudo apt install nvidia-driver nvidia-cuda-toolkit

Reboot to activate the Nvidia driver


Install Python

Python should already be installed, but we do need to install pip.

Once pip is installed, then we need to install numpy, accelerate, and transformers

sudo apt install python3-pip
pip install numpy
pip install accelerate>=0.12.0 transformers[torch]==4.25.1

Run Dolly

Run a python console. If you run it as administrator, it should be faster.


Run the following commands to set up Dolly.

import torch
from transformers import pipeline

generate_text = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")

# Alternatively, If you want to use a smaller model run

generate_text = pipeline(model="databricks/dolly-v2-3b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")


  1. If you have issues, you may want/need to specify an offload folder with offload_folder=”.\offloadfolder”. An SSD is preferable.
  2. If you have lots of RAM, you can take out the “torch_dtype=torch.bfloat16”
  3. If you do NOT have lots of ram (>32GB), then you may only be able to run the smallest model

Alternatively, if we don’t want to trust_remote_code, we can download this file, and run the following

from instruct_pipeline import InstructionTextGenerationPipeline
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-12b", padding_side="left")
model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-12b", device_map="auto")

generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)

Now we can ask Dolly a question.

generate_text("Your question?")


>>> generate_text("Tell me about Databricks dolly-v2-3b?")
'Dolly is the fully managed open-source engine that allows you to rapidly build, test, and deploy machine learning models, all on your own infrastructure.'

Further information is available at the following two links.