Setting up Databricks Dolly on Windows with GPU

The total process can take awhile to setup Dolly. You’ll need a good internet connection and around 50GB of hard drive space.

Install Nvidia CUDA Toolkit

You’ll need to install the CUDA Toolkit to take advantage of the GPU. The GPU is much faster than just using the CPU.

https://developer.nvidia.com/cuda-downloads

Install Git

Install git from the following site.

https://git-scm.com/downloads

Download Dolly

Download Dolly with git.

git lfs install 
git clone https://huggingface.co/databricks/dolly-v2-12b

Install Python

We’ll also need Python installed if it is not already.
https://www.python.org/downloads/release/

Next we’ll need the following installed

py.exe -m pip install numpy
py.exe -m pip install accelerate>=0.12.0 transformers[torch]==4.25.1
py.exe -m pip install numpy --pre torch --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu117 --user

The last one is needed to get Dolly to utilize a GPU.

Run Dolly

Run a python console. If you run it as administrator, it should be faster.

py.exe

Run the following commands to set up Dolly.

import torch
from transformers import pipeline

generate_text = pipeline(model="databricks/dolly-v2-3b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")

# Or to use the full model run

generate_text = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")

Note: if you have issues, you may want/need to specify an offload folder with offload_folder=”.\offloadfolder”. An SSD is preferable.
Also if you have lots of RAM, you can take out the “torch_dtype=torch.bfloat16”

Alternatively, if we don’t want to trust_remote_code, we can do run the following

from instruct_pipeline import InstructionTextGenerationPipeline
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-3b", padding_side="left")
model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-3b", device_map="auto")

generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)

Now can ask Dolly a question.

generate_text("Your question?")

Example:

>>> generate_text("Tell me about Databricks dolly-v2-3b?")
'Dolly is the fully managed open-source engine that allows you to rapidly build, test, and deploy machine learning models, all on your own infrastructure.'

Further information is available at the following two links.

https://github.com/databrickslabs/dolly
https://huggingface.co/databricks/dolly-v2-3b