{"id":5021,"date":"2023-04-20T16:21:00","date_gmt":"2023-04-20T21:21:00","guid":{"rendered":"https:\/\/www.incredigeek.com\/home\/?p=5021"},"modified":"2023-04-20T14:22:23","modified_gmt":"2023-04-20T19:22:23","slug":"setting-up-databricks-dolly-on-ubuntu-with-gpu","status":"publish","type":"post","link":"https:\/\/www.incredigeek.com\/home\/setting-up-databricks-dolly-on-ubuntu-with-gpu\/","title":{"rendered":"Setting up Databricks Dolly on Ubuntu with GPU"},"content":{"rendered":"\n<p>This is a quick guide for getting Dolly running on an Ubuntu machine with Nvidia GPUs.<\/p>\n\n\n\n<p>You\u2019ll need a good internet connection and around 35GB of hard drive space for the Nvidia driver, Dolly (12b model) and extras.  You can use the smaller models to take up less space.  The 8 billion parameter model uses about ~14GB of space while the 3 billion parameter one is around 6GB<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Install Nvidia Drivers and CUDA<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\">sudo apt install nvidia-driver nvidia-cuda-toolkit<\/pre>\n\n\n\n<p>Reboot to activate the Nvidia driver<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">reboot<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Install Python<\/h2>\n\n\n\n<p>Python should already be installed, but we do need to install pip. <\/p>\n\n\n\n<p>Once pip is installed, then we need to install numpy, accelerate, and transformers<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">sudo apt install python3-pip<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">pip install numpy<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">pip install accelerate>=0.12.0 transformers[torch]==4.25.1<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Run Dolly<\/h2>\n\n\n\n<p>Run a python console. If you run it as administrator, it should be faster.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">python3<\/pre>\n\n\n\n<p>Run the following commands to set up Dolly.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import torch\nfrom transformers import pipeline\n\ngenerate_text = pipeline(model=\"databricks\/dolly-v2-12b\", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map=\"auto\")\n\n# Alternatively, If you want to use a smaller model run\n\ngenerate_text = pipeline(model=\"databricks\/dolly-v2-3b\", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map=\"auto\")<\/pre>\n\n\n\n<p>Notes:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>If you have issues, you may want\/need to specify an offload folder with offload_folder=\u201d.\\offloadfolder\u201d. An SSD is preferable.<\/li>\n\n\n\n<li>If you have lots of RAM, you can take out the \u201ctorch_dtype=torch.bfloat16\u201d<\/li>\n\n\n\n<li>If you do NOT have lots of ram (>32GB), then you may only be able to run the smallest model<\/li>\n<\/ol>\n\n\n\n<p>Alternatively, if we don\u2019t want to trust_remote_code, we can download this <a href=\"https:\/\/huggingface.co\/databricks\/dolly-v2-12b\/blob\/main\/instruct_pipeline.py\">file<\/a>, and run the following<\/p>\n\n\n\n<pre id=\"block-62446712-7d69-4264-9f93-a2f0da8ea6ee\" class=\"wp-block-preformatted\">from instruct_pipeline import InstructionTextGenerationPipeline\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\ntokenizer = AutoTokenizer.from_pretrained(\"databricks\/dolly-v2-12b\", padding_side=\"left\")\nmodel = AutoModelForCausalLM.from_pretrained(\"databricks\/dolly-v2-12b\", device_map=\"auto\")\n\ngenerate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)<\/pre>\n\n\n\n<p>Now we can ask Dolly a question.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">generate_text(\"Your question?\")<\/pre>\n\n\n\n<p>Example:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">&gt;&gt;&gt; generate_text(\"Tell me about Databricks dolly-v2-3b?\")\n'Dolly is the fully managed open-source engine that allows you to rapidly build, test, and deploy machine learning models, all on your own infrastructure.'<\/pre>\n\n\n\n<p>Further information is available at the following two links.<\/p>\n\n\n\n<p><a href=\"https:\/\/github.com\/databrickslabs\/dolly\">https:\/\/github.com\/databrickslabs\/dolly<\/a><br><a href=\"https:\/\/huggingface.co\/databricks\/dolly-v2-3b\">https:\/\/huggingface.co\/databricks\/dolly-v2-3b<\/a><br><a href=\"https:\/\/huggingface.co\/databricks\/dolly-v2-12b\">https:\/\/huggingface.co\/databricks\/dolly-v2-12b<\/a><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is a quick guide for getting Dolly running on an Ubuntu machine with Nvidia GPUs. You\u2019ll need a good internet connection and around 35GB of hard drive space for the Nvidia driver, Dolly (12b model) and extras. You can &hellip; <a href=\"https:\/\/www.incredigeek.com\/home\/setting-up-databricks-dolly-on-ubuntu-with-gpu\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1458],"tags":[1454,1465,1456,412,269,49],"class_list":["post-5021","post","type-post","status-publish","format-standard","hentry","category-ai","tag-ai","tag-databrick","tag-dolly","tag-gpu","tag-nvidia","tag-ubuntu-2"],"_links":{"self":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts\/5021","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/comments?post=5021"}],"version-history":[{"count":7,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts\/5021\/revisions"}],"predecessor-version":[{"id":5029,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts\/5021\/revisions\/5029"}],"wp:attachment":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/media?parent=5021"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/categories?post=5021"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/tags?post=5021"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}