{"id":4992,"date":"2023-04-14T21:58:00","date_gmt":"2023-04-15T02:58:00","guid":{"rendered":"https:\/\/www.incredigeek.com\/home\/?p=4992"},"modified":"2023-04-20T13:37:33","modified_gmt":"2023-04-20T18:37:33","slug":"setting-up-databricks-dolly-on-windows-with-gpu","status":"publish","type":"post","link":"https:\/\/www.incredigeek.com\/home\/setting-up-databricks-dolly-on-windows-with-gpu\/","title":{"rendered":"Setting up Databricks Dolly on Windows with GPU"},"content":{"rendered":"\n<p>The total process can take awhile to setup Dolly.  You&#8217;ll need a good internet connection and around 50GB of hard drive space.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Install Nvidia CUDA Toolkit<\/h2>\n\n\n\n<p>You&#8217;ll need to install the CUDA Toolkit to take advantage of the GPU.  The GPU is much faster than just using the CPU.<\/p>\n\n\n\n<p><a href=\"https:\/\/developer.nvidia.com\/cuda-downloads\">https:\/\/developer.nvidia.com\/cuda-downloads<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Install Git<\/h2>\n\n\n\n<p>Install git from the following site.<\/p>\n\n\n\n<p><a href=\"https:\/\/git-scm.com\/downloads\">https:\/\/git-scm.com\/downloads<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Download Dolly<\/h2>\n\n\n\n<p>Download Dolly with git.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">git lfs install \ngit clone https:\/\/huggingface.co\/databricks\/dolly-v2-12b<\/pre>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Install Python<\/h2>\n\n\n\n<p>We&#8217;ll also need Python installed if it is not already.<br><a href=\"https:\/\/www.python.org\/downloads\/release\/\">https:\/\/www.python.org\/downloads\/release\/<\/a><\/p>\n\n\n\n<p>Next we&#8217;ll need the following installed<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">py.exe -m pip install numpy<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">py.exe -m pip install accelerate&gt;=0.12.0 transformers[torch]==4.25.1<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">py.exe -m pip install numpy --pre torch --force-reinstall --index-url https:\/\/download.pytorch.org\/whl\/nightly\/cu117 --user<\/pre>\n\n\n\n<p>The last one is needed to get Dolly to utilize a GPU.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Run Dolly<\/h2>\n\n\n\n<p>Run a python console.  If you run it as administrator, it should be faster.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">py.exe<\/pre>\n\n\n\n<p>Run the following commands to set up Dolly.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import torch\nfrom transformers import pipeline\n\ngenerate_text = pipeline(model=\"databricks\/dolly-v2-3b\", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map=\"auto\")\n\n# Or to use the full model run\n\ngenerate_text = pipeline(model=\"databricks\/dolly-v2-12b\", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map=\"auto\")<\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p>Note: if you have issues, you may want\/need to specify an offload folder with offload_folder=&#8221;.\\offloadfolder&#8221;.  An SSD is preferable.<br>Also if you have lots of RAM, you can take out the &#8220;torch_dtype=torch.bfloat16&#8221;<\/p>\n\n\n\n<p>Alternatively, if we don&#8217;t want to trust_remote_code, we can do run the following<\/p>\n\n\n\n<pre id=\"block-62446712-7d69-4264-9f93-a2f0da8ea6ee\" class=\"wp-block-preformatted\">from instruct_pipeline import InstructionTextGenerationPipeline\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\ntokenizer = AutoTokenizer.from_pretrained(\"databricks\/dolly-v2-3b\", padding_side=\"left\")\nmodel = AutoModelForCausalLM.from_pretrained(\"databricks\/dolly-v2-3b\", device_map=\"auto\")\n\ngenerate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)<\/pre>\n\n\n\n<p>Now can ask Dolly a question.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">generate_text(\"Your question?\")<\/pre>\n\n\n\n<p>Example:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">&gt;&gt;&gt; generate_text(\"Tell me about Databricks dolly-v2-3b?\")\n'Dolly is the fully managed open-source engine that allows you to rapidly build, test, and deploy machine learning models, all on your own infrastructure.'<\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p>Further information is available at the following two links.<\/p>\n\n\n\n<p><a href=\"https:\/\/github.com\/databrickslabs\/dolly\">https:\/\/github.com\/databrickslabs\/dolly<\/a><br><a href=\"https:\/\/huggingface.co\/databricks\/dolly-v2-3b\">https:\/\/huggingface.co\/databricks\/dolly-v2-3b<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The total process can take awhile to setup Dolly. You&#8217;ll need a good internet connection and around 50GB of hard drive space. Install Nvidia CUDA Toolkit You&#8217;ll need to install the CUDA Toolkit to take advantage of the GPU. The &hellip; <a href=\"https:\/\/www.incredigeek.com\/home\/setting-up-databricks-dolly-on-windows-with-gpu\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1458],"tags":[1454,1459,1457,1456,412,1455,269,293,1171,273],"class_list":["post-4992","post","type-post","status-publish","format-standard","hentry","category-ai","tag-ai","tag-cuda","tag-databricks","tag-dolly","tag-gpu","tag-machine-learning","tag-nvidia","tag-python","tag-rtx","tag-windows"],"_links":{"self":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts\/4992","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/comments?post=4992"}],"version-history":[{"count":3,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts\/4992\/revisions"}],"predecessor-version":[{"id":5025,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts\/4992\/revisions\/5025"}],"wp:attachment":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/media?parent=4992"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/categories?post=4992"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/tags?post=4992"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}