{"id":15672,"date":"2023-07-17T16:44:37","date_gmt":"2023-07-17T23:44:37","guid":{"rendered":"https:\/\/www.pugetsystems.com\/?post_type=hpc_post&#038;p=15672"},"modified":"2023-07-18T11:01:28","modified_gmt":"2023-07-18T18:01:28","slug":"can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost","status":"publish","type":"hpc_post","link":"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/","title":{"rendered":"Can You Run A State-Of-The-Art LLM On-Prem For A Reasonable Cost?"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_80 counter-hierarchy ez-toc-counter ez-toc-transparent ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#Introduction\" >Introduction<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#The_Hardware\" >The Hardware<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#State_of_the_art_open_source_LLMs\" >State of the art open source LLMs<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#Falcon-40b\" >Falcon-40b<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#Inference\" >Inference<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#Fine-tuning\" >Fine-tuning<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#Conclusion_and_Recommendations\" >Conclusion and Recommendations<\/a><\/li><\/ul><\/nav><\/div>\n\n<h2 class=\"wp-block-heading\" id=\"h-introduction\"><span class=\"ez-toc-section\" id=\"Introduction\"><\/span>Introduction<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Since the release of OpenAI ChatGPT, and the ensuing mania\/hype, a question that&#8217;s been on everyone&#8217;s mind is; <strong>can you run a state-of-the-art Large Language Model on-prem?<\/strong> With <em>your<\/em> data and <em>your<\/em> hardware? At a reasonable cost? The reasons for these questions include,<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>the cost of running a private access large language model on the cloud is prohibitively expensive<\/li>\n\n\n\n<li>paying for an LLM service access by-the-token can be expensive for large-scale use<\/li>\n\n\n\n<li>High-end GPU computing hardware is expensive and may not be justifiable for in-house exploratory research.<\/li>\n\n\n\n<li><strong>many companies do not want or cannot have their internal data posted to a cloud service!<\/strong><\/li>\n<\/ul>\n\n\n\n<p>That last point is a serious barrier. LLMs&#8217; most interesting usage possibilities involve integration with your private data stores and internal communication\/content.<\/p>\n\n\n\n<p><strong>So, can you run a large language model on-prem? Yes, you can!<\/strong><\/p>\n\n\n\n<p>I&#8217;ve been learning about and experimenting with LLM usage on a nicely configured quad GPU system here at Puget Systems for several weeks. My goal was to find out how much you can do on a system whose cost is within reach of many organizations and research groups (and some individuals). Let&#8217;s see how it went!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-hardware\"><span class=\"ez-toc-section\" id=\"The_Hardware\"><\/span>The Hardware<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The system I&#8217;ve been experimenting with is configured as follows:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>CPU: Intel Xeon w9-3475X 36-core Sapphire Rapids<\/li>\n\n\n\n<li>RAM: 512GB DDR5 4800MHz Reg ECC<\/li>\n\n\n\n<li>Motherboard: ASUS PRO WS W790E-SAGE SE<\/li>\n\n\n\n<li>2 x Sabrent Rocket 4 Plus 2TB PCIe Gen 4 M.2 SSD<\/li>\n\n\n\n<li>GPUs: 4 x NVIDIA RTX 6000 Ada Generation 48GB<\/li>\n\n\n\n<li>Ubuntu 22.04 LTS<\/li>\n<\/ul>\n\n\n\n<p>This is a nice system and it&#8217;s not cheap! But it&#8217;s also not out of reach for many organizations and research groups. From my experiments, I think you could get away with 2 x RTX 6000Ada (or 2 x A6000) for research and development work and internal application testing. 2 x 48GB GPUs may be enough for some internal production use cases too.<\/p>\n\n\n\n<p>I&#8217;m using a variation of the <a href=\"https:\/\/www.pugetsystems.com\/solutions\/scientific-computing-workstations\/machine-learning-ai\/\">Puget Systems &#8220;Workstations for Machine Learning \/ AI&#8221;<\/a>. We are currently qualifying the 4 x RTX 6000 Ada version of these systems with higher load power supplies.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-state-of-the-art-open-source-llms\"><span class=\"ez-toc-section\" id=\"State_of_the_art_open_source_LLMs\"><\/span>State of the art open source LLMs<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The open-source and research community quickly accelerated progress on open LLMs and developer tools after the release of ChatGPT. The infamous leaked Google memo titled &#8220;We have no moat, and neither does OpenAI&#8221; (search for that) was an insightful reflection on how rapidly open research development would make progress on LLMs. For the past several months, new models and tools released almost every week. It&#8217;s been a challenge trying to keep up! Better data curation more efficient training methods, and community efforts at instruct tuning have been increasing the quality and reducing the size of LLMs.<\/p>\n\n\n\n<p>An interesting resource to keep track of open LLM development is the <a href=\"https:\/\/huggingface.co\/spaces\/HuggingFaceH4\/open_llm_leaderboard\">HuggingFace Open LLM Leaderboard<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-falcon-40b\"><span class=\"ez-toc-section\" id=\"Falcon-40b\"><\/span>Falcon-40b<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Falcon-40b (instruct) is the current top LLM on the HuggingFace leaderboard. This model was developed at <a href=\"https:\/\/www.tii.ae\/\">Technology Innovation Institute<\/a> in Abu Dhabi, United Arab Emirates. I find it delightful that this model was developed at an institution that may not be well-known to many people. Their HuggingFace organization card is a good starting point for information on their LLM work.<\/p>\n\n\n\n<p>Falcon-40b is a 40 billion parameter LLM that performs better than many larger models. It&#8217;s a good example of open-source innovation and improvement in this domain. It is licensed under the Apache 2.0 license.<\/p>\n\n\n\n<p>There is also an instruct-tuned model, falcon-40b-instruct, useful as a chat type of LLM. Others have also fine-tuned this model with other datasets. It&#8217;s worth searching on HuggingFace for &#8220;falcon-40b&#8221; to see what&#8217;s available.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-inference\"><span class=\"ez-toc-section\" id=\"Inference\"><\/span>Inference<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>The inference performance of falcon-40b-instruct on two RTX 6000Ada GPUs is good. Just a few seconds for a typical response to a prompt. The model is sharded across the two GPUs using nearly 90GB of GPU memory when using its native bfloat16 trained precision. (This model will not load on a single 80GB A100!)<\/p>\n\n\n\n<p>The quality of the responses is very good, approaching that of ChatGPT for many prompts. I have tested responses using prompts from the <a href=\"https:\/\/www.deeplearning.ai\/short-courses\/\">Deeplearning.ai short course ChatGPT Prompt Engineering for Developers<\/a> (recommended!). Note that the model is trained on a different corpus than ChatGPT, and it is a much smaller model so the responses are different. However, a little prompt engineering can produce very good results.<\/p>\n\n\n\n<p>I have also used this model and some other fine-tuned variants with the <a href=\"https:\/\/github.com\/huggingface\/text-generation-inference\">HuggingFace Text-Generation-Inference server<\/a>. With the addition of a front web GUI this is suitable for production use!<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-fine-tuning\"><span class=\"ez-toc-section\" id=\"Fine-tuning\"><\/span>Fine-tuning<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Falcon 40b is available on HuggingFace with a base and instruct version. The base version is a good starting point for fine-tuning. <a href=\"https:\/\/huggingface.co\/models?pipeline_tag=text-generation&amp;sort=trending&amp;search=falcon\">There are over 200 Falcon 40b and 7b models on HuggingFace<\/a> fine-tuned with various methods and datasets including sever tuned with the <a href=\"https:\/\/github.com\/LAION-AI\/Open-Assistant\">Open-Assistant<\/a> datasets.<\/p>\n\n\n\n<p>I have done fine-tuning testing with <a href=\"https:\/\/huggingface.co\/datasets\/timdettmers\/openassistant-guanaco\">Tim Dettmers openassistant-guanaco dataset<\/a>.<\/p>\n\n\n\n<p><strong>Note: 4 x RTX6000Ada does not provide enough GPU memory to do LoRA fine-tuning of falcon-40b using model native precision of bfloat16. I have been able to do fine-tuning using QLoRA 4-bit with <a href=\"https:\/\/github.com\/TimDettmers\/bitsandbytes\">Tim Dettmers bitsandbytes<\/a>. That will train using 2 x RTX 6000 Ada GPUs.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-conclusion-and-recommendations\"><span class=\"ez-toc-section\" id=\"Conclusion_and_Recommendations\"><\/span>Conclusion and Recommendations<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>My primary recommendation is to <strong>start!<\/strong> The frenzy of development activity with open LLMs and associated tools is not going to slow down. The quality and diversity of the models and the tools are improving rapidly.<\/p>\n\n\n\n<p>I feel that a system with 2-4 RTX 6000 Ada GPUs is an acceptable platform for useful work. Having a system on-prem for development work and experimentation is a good entry point for the new AI era. It may be all you need! It is possible to do research using smaller models with a more modest system configuration. I would say a system with an RTX 4090 or 3090 is minimal. If that&#8217;s what you have, then use it! You could evaluate the feasibility of in-house application development and research, which would help you decide if you can justify a larger investment.<\/p>\n\n\n\n<p>I am certainly biased toward owning your hardware. I use the cloud too. But, I prefer to have a system I can completely control, customize and use with in-house proprietary data.<\/p>\n\n\n\n<p>I am really enjoying learning and working with this stuff! Expect to see related posts soon.<\/p>\n\n\n\n<p><strong>Happy Computing! &#8211;dbk @dbkinghorn<\/strong><\/p>\n\n\n\n<div class=\"wp-block-columns border rounded p-2 is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h3 class=\"wp-block-heading has-text-align-center\">Related Content<\/h3>\n\n\n \n<div class=\"related-content\">\n\t<ul class=\"related-content-list\">\n\t\t\t\t\t\t<li class=\"related-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/amd-zen4-threadripper-pro-vs-intel-xeon-w9-for-science-and-engineering\/\" title=\"AMD Zen4 Threadripper PRO vs Intel Xeon-w9 For Science and Engineering\">AMD Zen4 Threadripper PRO vs Intel Xeon-w9 For Science and Engineering<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"related-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/gtc23-notes-and-selected-sessions\/\" title=\"GTC23 \u00a0Notes And Selected Sessions\">GTC23 \u00a0Notes And Selected Sessions<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"related-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/ryzen-7950x-zen4-avx512-performance-with-amd-aoccv4-hpl-hpcg-hpl-mxp\/\" title=\"Ryzen 7950x Zen4 AVX512 Performance With AMD AOCCv4 HPL HPCG HPL-MxP\">Ryzen 7950x Zen4 AVX512 Performance With AMD AOCCv4 HPL HPCG HPL-MxP<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"related-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/nvidia-rtx4090-ml-ai-and-scientific-computing-performance-preliminary-2382\/\" title=\"NVIDIA RTX4090 ML-AI and Scientific Computing Performance (Preliminary)\">NVIDIA RTX4090 ML-AI and Scientific Computing Performance (Preliminary)<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t<\/ul>\n\t \n\t<a class=\"view-term-link\" href=\"\/all_articles?filter=hpc\">View\n\t\tAll Related Content<\/a>\n\t<\/div><\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h3 class=\"wp-block-heading has-text-align-center\">Latest Content<\/h3>\n\n\n \n<div class=\"latest-content\">\n\t<ul class=\"latest-content-list\">\n\t\t\t\t\t\t<li class=\"latest-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/articles\/2025-professional-gpu-engineering-roundup\/\" title=\"2025 Professional GPU Engineering Roundup\">2025 Professional GPU Engineering Roundup<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"latest-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/articles\/2025-professional-gpu-content-creation-roundup\/\" title=\"2025 Professional GPU Content Creation Roundup\">2025 Professional GPU Content Creation Roundup<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"latest-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/articles\/a-quick-look-at-rendering-performance-in-windows-vs-linux\/\" title=\"A Quick Look at Rendering Performance in Windows vs Linux\">A Quick Look at Rendering Performance in Windows vs Linux<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"latest-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/\" title=\"Standing Up AI Development Quickly for Supercomputing 2025\">Standing Up AI Development Quickly for Supercomputing 2025<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t<\/ul>\n\t \n\t\t<a href=\"\/all_posts\" class=\"view-posts-link\">View All<\/a>\n\t<\/div><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>In this post address the question that&#8217;s been on everyone&#8217;s mind; Can you run a state-of-the-art Large Language Model on-prem? With *your* data and *your* hardware? At a reasonable cost?<\/p>\n","protected":false},"author":145,"featured_media":15681,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","meta":{"_acf_changed":false,"content-type":"","classic-editor-remember":"","legacy_id":"","redirect_url":[],"expire_date":"","alert_message":"","alert_link":[],"configure_ids":"","system_grid_title":"","system_grid_ids":"","footnotes":""},"hpc_categories":[8880,8883],"hpc_tags":[9093,8765,8770],"coauthors":[9057],"class_list":["post-15672","hpc_post","type-hpc_post","status-publish","has-post-thumbnail","hentry","hpc_category-hardware-recommendations","hpc_category-machine-learning","hpc_tag-llms","hpc_tag-machine-learning","hpc_tag-ml-ai"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.7 (Yoast SEO v26.7) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Can You Run A State-Of-The-Art LLM On-Prem For A Reasonable Cost? | Puget Systems<\/title>\n<meta name=\"description\" content=\"In this post address the question that&#039;s been on everyone&#039;s mind; Can you run a state-of-the-art Large Language Model on-prem? With *your* data and *your* hardware? At a reasonable cost?\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Can You Run A State-Of-The-Art LLM On-Prem For A Reasonable Cost?\" \/>\n<meta property=\"og:description\" content=\"In this post address the question that&#039;s been on everyone&#039;s mind; Can you run a state-of-the-art Large Language Model on-prem? With *your* data and *your* hardware? At a reasonable cost?\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/\" \/>\n<meta property=\"og:site_name\" content=\"Puget Systems\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/PugetSystems\" \/>\n<meta property=\"article:modified_time\" content=\"2023-07-18T18:01:28+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.pugetsystems.com\/wp-content\/uploads\/2023\/07\/sd-falcon2.png\" \/>\n\t<meta property=\"og:image:width\" content=\"768\" \/>\n\t<meta property=\"og:image:height\" content=\"768\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@PugetSystems\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"5 minutes\" \/>\n\t<meta name=\"twitter:label2\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data2\" content=\"Dr. Donald Kinghorn\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/\",\"url\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/\",\"name\":\"Can You Run A State-Of-The-Art LLM On-Prem For A Reasonable Cost? | Puget Systems\",\"isPartOf\":{\"@id\":\"https:\/\/www.pugetsystems.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/wp-cdn.pugetsystems.com\/2023\/07\/sd-falcon2.png\",\"datePublished\":\"2023-07-17T23:44:37+00:00\",\"dateModified\":\"2023-07-18T18:01:28+00:00\",\"description\":\"In this post address the question that's been on everyone's mind; Can you run a state-of-the-art Large Language Model on-prem? With *your* data and *your* hardware? At a reasonable cost?\",\"breadcrumb\":{\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#primaryimage\",\"url\":\"https:\/\/wp-cdn.pugetsystems.com\/2023\/07\/sd-falcon2.png\",\"contentUrl\":\"https:\/\/wp-cdn.pugetsystems.com\/2023\/07\/sd-falcon2.png\",\"width\":768,\"height\":768},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.pugetsystems.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"HPC Posts\",\"item\":\"https:\/\/www.pugetsystems.com\/all-hpc\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Can You Run A State-Of-The-Art LLM On-Prem For A Reasonable Cost?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.pugetsystems.com\/#website\",\"url\":\"https:\/\/www.pugetsystems.com\/\",\"name\":\"Puget Systems\",\"description\":\"Workstations for creators.\",\"publisher\":{\"@id\":\"https:\/\/www.pugetsystems.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.pugetsystems.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.pugetsystems.com\/#organization\",\"name\":\"Puget Systems\",\"url\":\"https:\/\/www.pugetsystems.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.pugetsystems.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.pugetsystems.com\/wp-content\/uploads\/2022\/08\/Puget-Systems-2020-logo-color-full.png\",\"contentUrl\":\"https:\/\/www.pugetsystems.com\/wp-content\/uploads\/2022\/08\/Puget-Systems-2020-logo-color-full.png\",\"width\":2560,\"height\":363,\"caption\":\"Puget Systems\"},\"image\":{\"@id\":\"https:\/\/www.pugetsystems.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/PugetSystems\",\"https:\/\/x.com\/PugetSystems\",\"https:\/\/www.instagram.com\/pugetsystems\/\",\"https:\/\/www.linkedin.com\/company\/puget-systems\",\"https:\/\/www.youtube.com\/user\/pugetsys\",\"https:\/\/en.wikipedia.org\/wiki\/Puget_Systems\"],\"telephone\":\"(425) 458-0273\",\"legalName\":\"Puget Sound Systems, Inc.\",\"foundingDate\":\"2000-12-01\",\"duns\":\"128267585\",\"naics\":\"334111\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Can You Run A State-Of-The-Art LLM On-Prem For A Reasonable Cost? | Puget Systems","description":"In this post address the question that's been on everyone's mind; Can you run a state-of-the-art Large Language Model on-prem? With *your* data and *your* hardware? At a reasonable cost?","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/","og_locale":"en_US","og_type":"article","og_title":"Can You Run A State-Of-The-Art LLM On-Prem For A Reasonable Cost?","og_description":"In this post address the question that's been on everyone's mind; Can you run a state-of-the-art Large Language Model on-prem? With *your* data and *your* hardware? At a reasonable cost?","og_url":"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/","og_site_name":"Puget Systems","article_publisher":"https:\/\/www.facebook.com\/PugetSystems","article_modified_time":"2023-07-18T18:01:28+00:00","og_image":[{"width":768,"height":768,"url":"https:\/\/www.pugetsystems.com\/wp-content\/uploads\/2023\/07\/sd-falcon2.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_site":"@PugetSystems","twitter_misc":{"Est. reading time":"5 minutes","Written by":"Dr. Donald Kinghorn"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/","url":"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/","name":"Can You Run A State-Of-The-Art LLM On-Prem For A Reasonable Cost? | Puget Systems","isPartOf":{"@id":"https:\/\/www.pugetsystems.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#primaryimage"},"image":{"@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#primaryimage"},"thumbnailUrl":"https:\/\/wp-cdn.pugetsystems.com\/2023\/07\/sd-falcon2.png","datePublished":"2023-07-17T23:44:37+00:00","dateModified":"2023-07-18T18:01:28+00:00","description":"In this post address the question that's been on everyone's mind; Can you run a state-of-the-art Large Language Model on-prem? With *your* data and *your* hardware? At a reasonable cost?","breadcrumb":{"@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#primaryimage","url":"https:\/\/wp-cdn.pugetsystems.com\/2023\/07\/sd-falcon2.png","contentUrl":"https:\/\/wp-cdn.pugetsystems.com\/2023\/07\/sd-falcon2.png","width":768,"height":768},{"@type":"BreadcrumbList","@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/can-you-run-a-state-of-the-art-llm-on-prem-for-a-reasonable-cost\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pugetsystems.com\/"},{"@type":"ListItem","position":2,"name":"HPC Posts","item":"https:\/\/www.pugetsystems.com\/all-hpc\/"},{"@type":"ListItem","position":3,"name":"Can You Run A State-Of-The-Art LLM On-Prem For A Reasonable Cost?"}]},{"@type":"WebSite","@id":"https:\/\/www.pugetsystems.com\/#website","url":"https:\/\/www.pugetsystems.com\/","name":"Puget Systems","description":"Workstations for creators.","publisher":{"@id":"https:\/\/www.pugetsystems.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pugetsystems.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.pugetsystems.com\/#organization","name":"Puget Systems","url":"https:\/\/www.pugetsystems.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.pugetsystems.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.pugetsystems.com\/wp-content\/uploads\/2022\/08\/Puget-Systems-2020-logo-color-full.png","contentUrl":"https:\/\/www.pugetsystems.com\/wp-content\/uploads\/2022\/08\/Puget-Systems-2020-logo-color-full.png","width":2560,"height":363,"caption":"Puget Systems"},"image":{"@id":"https:\/\/www.pugetsystems.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/PugetSystems","https:\/\/x.com\/PugetSystems","https:\/\/www.instagram.com\/pugetsystems\/","https:\/\/www.linkedin.com\/company\/puget-systems","https:\/\/www.youtube.com\/user\/pugetsys","https:\/\/en.wikipedia.org\/wiki\/Puget_Systems"],"telephone":"(425) 458-0273","legalName":"Puget Sound Systems, Inc.","foundingDate":"2000-12-01","duns":"128267585","naics":"334111"}]}},"_links":{"self":[{"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/hpc_posts\/15672","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/hpc_posts"}],"about":[{"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/types\/hpc_post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/users\/145"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/comments?post=15672"}],"version-history":[{"count":0,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/hpc_posts\/15672\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/media\/15681"}],"wp:attachment":[{"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/media?parent=15672"}],"wp:term":[{"taxonomy":"hpc_category","embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/hpc_categories?post=15672"},{"taxonomy":"hpc_tag","embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/hpc_tags?post=15672"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/coauthors?post=15672"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}