{"id":42432,"date":"2025-12-12T14:02:39","date_gmt":"2025-12-12T22:02:39","guid":{"rendered":"https:\/\/www.pugetsystems.com\/?post_type=hpc_post&#038;p=42432"},"modified":"2025-12-12T14:02:45","modified_gmt":"2025-12-12T22:02:45","slug":"standing-up-ai-development-quickly-for-supercomputing-2025","status":"publish","type":"hpc_post","link":"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/","title":{"rendered":"Standing Up AI Development Quickly for Supercomputing 2025"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_80 counter-hierarchy ez-toc-counter ez-toc-transparent ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#The_Challenge\" >The Challenge<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#Methodology_Vibe_Coding_and_Rapid_Prototypes\" >Methodology: Vibe Coding and Rapid Prototypes<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#Phase_1_Baseline_Validation_on_Standardized_Hardware\" >Phase 1: Baseline Validation on Standardized Hardware<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#Phase_2_Integration_Challenges_in_High-Density_Environments\" >Phase 2: Integration Challenges in High-Density Environments<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#Engineering_Resolutions\" >Engineering Resolutions<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#The_Architecture_More_Than_Just_a_Script\" >The Architecture: More Than Just a Script<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#Conclusion\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n\n<p><strong><em>How I used &#8220;Vibe Coding&#8221; and 25 years of experience to tame a liquid-cooled supercomputer in two weeks.<\/em><\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-challenge\"><span class=\"ez-toc-section\" id=\"The_Challenge\"><\/span>The Challenge<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Two weeks before Supercomputing 2025, the marketing team came to me with a challenge: <em>&#8220;We\u2019re headed to Supercomputing for the first time. Can we run a multi-GPU AI simulation on the Comino Grando?&#8221;<\/em><\/p>\n\n\n\n<p>The timeline was tight. The hardware was complex. My goal was to prove that we don&#8217;t just build hardware &#8211; I wanted to show that we deeply understand the workloads that run on them.<\/p>\n\n\n\n<p><strong>The Hardware: Comino Grando Server<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Processors:<\/strong> Dual AMD EPYC\u2122 9000 Series (Liquid Cooled)<\/li>\n\n\n\n<li><strong>Accelerators:<\/strong> 8x NVIDIA L40S GPUs (Liquid Cooled)<\/li>\n\n\n\n<li><strong>Cooling:<\/strong> Full system direct-to-chip liquid cooling<\/li>\n\n\n\n<li><strong>OS:<\/strong> Ubuntu Desktop (initially), migrated to Server\/Headless for production<\/li>\n<\/ul>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-methodology-vibe-coding-and-rapid-prototypes\"><span class=\"ez-toc-section\" id=\"Methodology_Vibe_Coding_and_Rapid_Prototypes\"><\/span>Methodology: Vibe Coding and Rapid Prototypes<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>I don&#8217;t love the term &#8220;vibe coding&#8221;&#8230; but I understand why it exists. It describes a workflow where I focus on <em>intent<\/em> and <em>architecture<\/em> while an AI handles the implementation details. It\u2019s not about letting the AI think for me &#8211; it\u2019s about how I prompt the LLM to generate exactly what I need.<\/p>\n\n\n\n<p>I\u2019ve spent 25 years in the trenches of software development, from reverse engineering Assembly code to launching SaaS products that have run for decades. In that time, I\u2019ve learned that true speed comes from clarity.<\/p>\n\n\n\n<p><strong>My Toolkit:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AI Acceleration:<\/strong> Robust prompting strategies (RooCode\/Gemini)<\/li>\n\n\n\n<li><strong>Architecture:<\/strong> Containerized microservices (Docker)<\/li>\n\n\n\n<li><strong>Orchestration:<\/strong> Python-based control layers<\/li>\n\n\n\n<li><strong>Timeframe:<\/strong> Initial POC ready in about 1 week (~12 hours of actual coding)<\/li>\n<\/ul>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-phase-1-baseline-validation-on-standardized-hardware\"><span class=\"ez-toc-section\" id=\"Phase_1_Baseline_Validation_on_Standardized_Hardware\"><\/span>Phase 1: Baseline Validation on Standardized Hardware<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Our target machine, a <a href=\"https:\/\/www.pugetsystems.com\/products\/rackmount-servers\/comino-grando-servers\/\" rel=\"noreferrer noopener\" target=\"_blank\">Comino Grando<\/a>, was still being provisioned. I couldn&#8217;t wait.<\/p>\n\n\n\n<p>Coincidentally, an <strong>NVIDIA DGX Spark<\/strong> had just arrived in our lab. I jumped on it immediately to start the proof of concept.<\/p>\n\n\n\n<p>I chose <strong>NVIDIA PhysicsNeMo<\/strong> to simulate aerodynamic airflow over an object (the &#8220;Ahmed body&#8221;). This was the perfect test case: it leverages established libraries and allows us to visually compare AI-predicted results against traditional software simulations.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"415\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2025\/12\/Standing-Up-AI-Development-Screenshot-1.png\" alt=\"Screenshot of Real-time Visualization of Pressure and Shear Stress\" class=\"wp-image-42439\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>The goal: Real-time visualization of pressure and shear stress on the Ahmed body, rendered via PyVista.<\/em><\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><strong>The DGX Spark Developer Experience:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Seamless Setup:<\/strong> Graphical Ubuntu and CUDA worked out of the box.<\/li>\n\n\n\n<li><strong>Easy Tooling:<\/strong> NVIDIA container tools installed without errors.<\/li>\n\n\n\n<li><strong>Speed:<\/strong> With AI assistance, I reviewed demos and had a training notebook running in <strong>less than 2 hours<\/strong>.<\/li>\n\n\n\n<li><strong>Result:<\/strong> The entire project, from unboxing to live demo, took less than 30 hours of active work spread over a two-week sprint.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-phase-2-integration-challenges-in-high-density-environments\"><span class=\"ez-toc-section\" id=\"Phase_2_Integration_Challenges_in_High-Density_Environments\"><\/span>Phase 2: Integration Challenges in High-Density Environments<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The Spark was fantastic, but it was a single-GPU environment. My recent focus had been on marketing data modeling, so I hadn&#8217;t spent time researching distributed training requirements.<\/p>\n\n\n\n<p>When I migrated the code to the <strong>Comino Grando<\/strong>, a liquid-cooled server with multi-GPU architecture, the initial deployment failed to function correctly.<\/p>\n\n\n\n<p>I hit the wall that separates coding from systems engineering:<\/p>\n\n\n\n<p><strong>The Silent GPUs (CPU Fallback):<\/strong> The containerized environment deployed successfully, but <code>nvidia-smi<\/code> revealed the truth: <strong>0% GPU utilization<\/strong>. The application had silently fallen back to CPU inference because the container runtime couldn&#8217;t access the GPU interconnects.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"649\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2025\/12\/Standing-Up-AI-Development-Screenshot-2.png\" alt=\"Screenshot of Puget Systems AI Training Demo GPU Usage Monitor Showing No Load\" class=\"wp-image-42440\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>The initial deployment on the Comino server. Docker containers were running, but we were seeing 0% GPU use in our Grafana dashboards.<\/em><\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><strong>Desktop vs. Docker:<\/strong> I hit a hard stop with the Operating System. The Ubuntu Desktop OS simply would not share the GPUs with our Docker containers. The graphical interface layers were monopolizing resources or permissions in a way that prevented the container runtime from accessing the cards.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"956\" height=\"457\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2025\/12\/Standing-Up-AI-Development-Screenshot-3.png\" alt=\"Screenshot Showing Conflict Between Ubuntu Desktop OS and Docker for GPU Access\" class=\"wp-image-42441\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>The Desktop OS conflict in action. While we requested all resources, the OS held onto specific cards (GPU 0 was held by the OS) for display output, preventing the training loop from accessing the full cluster.<\/em><\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Refactoring for Scale (Torchrun):<\/strong> Our initial single-GPU script used a simple execution flow. To leverage all 8 GPUs, I had to refactor the codebase to use <code>torchrun<\/code>. Standard Python execution lacks the distributed communication backend (NCCL) required to synchronize gradients across multiple cards in real-time.<\/li>\n\n\n\n<li><strong>Physics Shift:<\/strong> I pivoted the simulation target from simple airflow to <strong>Shear and Pressure<\/strong> (based on an existing NVIDIA demo). While this model was better suited for <code>torchrun<\/code>, the original code wasn&#8217;t set up for our specific distributed environment.<\/li>\n<\/ol>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-engineering-resolutions\"><span class=\"ez-toc-section\" id=\"Engineering_Resolutions\"><\/span>Engineering Resolutions<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Speed doesn&#8217;t mean skipping steps &#8211; it means iterating faster.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>The Headless Pivot:<\/strong> Using the server&#8217;s onboard iKVM for triage, I discovered that the Ubuntu Desktop environment was aggressively claiming the NVIDIA cards for display output instead of defaulting to the onboard VGA. By switching to headless Ubuntu (removing the GUI entirely), I stopped the OS from monopolizing the GPUs, instantly freeing them for our Dockerized simulation.<\/li>\n\n\n\n<li><strong>Distributed Training Implementation:<\/strong> I rewrote the entry points to support <code>torchrun<\/code>, enabling the model to parallelize the &#8220;Shear and Pressure&#8221; calculations across the entire 8-GPU cluster.<\/li>\n\n\n\n<li><strong>Visualization Pipeline:<\/strong> I modified the <code>visualizer.py<\/code> scripts to ingest our custom <code>.vtp<\/code> 3D files, mapping the inference outputs to Puget Systems&#8217; brand color palette for the live display.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-architecture-more-than-just-a-script\"><span class=\"ez-toc-section\" id=\"The_Architecture_More_Than_Just_a_Script\"><\/span>The Architecture: More Than Just a Script<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>To be clear, I didn&#8217;t just run a Python script in a terminal. I needed a robust, trade-show-ready &#8220;kiosk&#8221; that could run autonomously on the show floor.<\/p>\n\n\n\n<p>I built a full-stack application to wrap the training loop:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>The Backend (FastAPI):<\/strong> A custom API that orchestrates the <code>torchrun<\/code> distributed training jobs and manages the simulation state.<\/li>\n\n\n\n<li><strong>The Visualization (PyVista):<\/strong> A dedicated pipeline that ingests the raw <code>.vtu<\/code> inference files and renders 3D streamlines in real-time.<\/li>\n\n\n\n<li><strong>The Frontend (HTML\/JS\/Tailwind):<\/strong> A web-based dashboard that allows users to start, stop, and loop simulations with a click, while displaying live inference results side-by-side with system metrics.<\/li>\n\n\n\n<li><strong>The Infrastructure (Docker Compose):<\/strong> The entire stack &#8211; training, API, frontend, and a Prometheus\/Grafana monitoring suite &#8211; is containerized. This allowed me to develop on a single workstation and deploy to the supercomputer with a single <code>docker compose up<\/code> command.<\/li>\n<\/ul>\n\n\n\n<p>This architecture turned a research experiment into a product demonstration.<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-conclusion\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>In less than two weeks, I went from a marketing question to a live, multi-GPU AI simulation running on one of the most powerful liquid-cooled servers on the market.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"407\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2025\/12\/Standing-Up-AI-Development-Screenshot-4.png\" alt=\"Screenshot of Puget Systems AI Training Demo GPU Usage Monitor Showing Full Load\" class=\"wp-image-42442\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>Success. After switching to Headless Ubuntu and implementing<\/em> <code><em>torchrun<\/em><\/code><em>, we achieved full, sustained saturation across the entire GPU cluster.<\/em><\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>More importantly, I hit these walls &#8211; specifically driver conflicts, distributed training backends, and OS resource contention &#8211; so you don&#8217;t have to. (And for the admins who love a GUI: we are actively working on a configuration that supports both Ubuntu Desktop and full-scale training on the Comino.)<\/p>\n\n\n\n<p><strong>Technical Resources<\/strong>: For those interested in the code, I\u2019ve made<a href=\"http:\/\/github.com\/Puget-Systems\/ps-supercompute-demo\"> the project public on GitHub<\/a>. <\/p>\n\n\n\n<div style=\"height:10px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n<div class=\"wp-bootstrap-blocks-row row puget-icon-section\">\n\t\n\n<div class=\"col-12 col-md-6\">\n\t\t\t\n\n<div style=\"height:10px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image size-thumbnail is-resized text-center\"><a href=\"https:\/\/www.pugetsystems.com\/solutions\/ai\/\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/12\/computer-icon.png\" alt=\"Tower Computer Icon in Puget Systems Colors\" class=\"wp-image-12659\" style=\"width:113px;height:113px\" title=\"\"\/><\/a><\/figure>\n\n\n\n<h4 class=\"wp-block-heading has-text-align-center\" id=\"h-looking-for-an-ai-workstation-or-server\">Looking for an AI workstation or server?<\/h4>\n\n\n\n<p class=\"has-text-align-center\">We build computers tailor-made for your workflow.&nbsp;<\/p>\n\n\n<div class=\"wp-bootstrap-blocks-button text-center\">\n\t<a\n\t\thref=\"https:\/\/www.pugetsystems.com\/solutions\/ai\/\"\n\t\t\t\t\t\tclass=\"btn btn-primary\"\n\t>\n\t\tConfigure a System\t<\/a>\n<\/div>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\t<\/div>\n\n\n\n<div class=\"col-12 col-md-6\">\n\t\t\t\n\n<div style=\"height:10px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image size-thumbnail is-resized text-center\"><a href=\"\/contact-expert\/\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/talking-icon.png\" alt=\"Talking Head Icon in Puget Systems Colors\" class=\"wp-image-12657\" style=\"width:113px;height:113px\" title=\"\"\/><\/a><\/figure>\n\n\n\n<h4 class=\"wp-block-heading has-text-align-center\" id=\"h-don-t-know-where-to-start-we-can-help\">Don&#8217;t know where to start?<br>We can help!<\/h4>\n\n\n\n<p class=\"has-text-align-center\">Get in touch with one of our technical consultants today.<\/p>\n\n\n<div class=\"wp-bootstrap-blocks-button text-center\">\n\t<a\n\t\thref=\"\/contact-expert\/\"\n\t\t\t\t\t\tclass=\"btn btn-primary\"\n\t>\n\t\tTalk to an Expert\t<\/a>\n<\/div>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\t<\/div>\n\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h3 class=\"wp-block-heading has-text-align-center\" id=\"h-related-content\">Related Content<\/h3>\n\n\n \n<div class=\"related-content\">\n\t<ul class=\"related-content-list\">\n\t\t\t\t\t\t<li class=\"related-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/\" title=\"Standing Up AI Development Quickly for Supercomputing 2025\">Standing Up AI Development Quickly for Supercomputing 2025<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"related-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/exploring-hybrid-cpu-gpu-llm-inference\/\" title=\"Exploring Hybrid CPU\/GPU LLM Inference\">Exploring Hybrid CPU\/GPU LLM Inference<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"related-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/whats-the-deal-with-npus\/\" title=\"What&#8217;s the deal with NPUs?\">What&#8217;s the deal with NPUs?<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"related-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/local-alternatives-to-cloud-ai-services\/\" title=\"Local alternatives to Cloud AI services\">Local alternatives to Cloud AI services<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t<\/ul>\n\t \n\t<a class=\"view-term-link\" href=\"\/all_articles?filter=machine-learning\">View\n\t\tAll Related Content<\/a>\n\t<\/div><\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h3 class=\"wp-block-heading has-text-align-center\" id=\"h-latest-content\">Latest Content<\/h3>\n\n\n \n<div class=\"latest-content\">\n\t<ul class=\"latest-content-list\">\n\t\t\t\t\t\t<li class=\"latest-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/\" title=\"Standing Up AI Development Quickly for Supercomputing 2025\">Standing Up AI Development Quickly for Supercomputing 2025<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"latest-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/exploring-hybrid-cpu-gpu-llm-inference\/\" title=\"Exploring Hybrid CPU\/GPU LLM Inference\">Exploring Hybrid CPU\/GPU LLM Inference<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"latest-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/whats-the-deal-with-npus\/\" title=\"What&#8217;s the deal with NPUs?\">What&#8217;s the deal with NPUs?<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"latest-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/local-alternatives-to-cloud-ai-services\/\" title=\"Local alternatives to Cloud AI services\">Local alternatives to Cloud AI services<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t<\/ul>\n\t \n\t\t<a href=\"https:\/\/www.pugetsystems.com\/all-hpc\/\" class=\"view-posts-link\">View All<\/a>\n\t<\/div><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>How I used &#8220;Vibe Coding&#8221; and 25 years of experience to tame a liquid-cooled supercomputer in two weeks.<\/p>\n","protected":false},"author":255,"featured_media":42455,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","meta":{"_acf_changed":false,"content-type":"","classic-editor-remember":"","legacy_id":"","redirect_url":[],"expire_date":"","alert_message":"","alert_link":[],"configure_ids":"","system_grid_title":"","system_grid_ids":"","footnotes":""},"hpc_categories":[8879,8883,8885],"hpc_tags":[9693,9694,8709,8765,8770,8779,8812,8855],"coauthors":[9664],"class_list":["post-42432","hpc_post","type-hpc_post","status-publish","has-post-thumbnail","hentry","hpc_category-hardware","hpc_category-machine-learning","hpc_category-software","hpc_tag-comino","hpc_tag-development","hpc_tag-docker","hpc_tag-machine-learning","hpc_tag-ml-ai","hpc_tag-nvidia","hpc_tag-python","hpc_tag-ubuntu"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.7 (Yoast SEO v26.7) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Standing Up AI Development Quickly for Supercomputing 2025 | Puget Systems<\/title>\n<meta name=\"description\" content=\"How I used &quot;Vibe Coding&quot; and 25 years of experience to tame a liquid-cooled supercomputer in two weeks.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Standing Up AI Development Quickly for Supercomputing 2025\" \/>\n<meta property=\"og:description\" content=\"How I used &quot;Vibe Coding&quot; and 25 years of experience to tame a liquid-cooled supercomputer in two weeks.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/\" \/>\n<meta property=\"og:site_name\" content=\"Puget Systems\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/PugetSystems\" \/>\n<meta property=\"article:modified_time\" content=\"2025-12-12T22:02:45+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/wp-cdn.pugetsystems.com\/2025\/12\/SC25-AI-Demo-Blog-Post.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1280\" \/>\n\t<meta property=\"og:image:height\" content=\"720\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@PugetSystems\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"7 minutes\" \/>\n\t<meta name=\"twitter:label2\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data2\" content=\"Dustin Moore\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/\",\"url\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/\",\"name\":\"Standing Up AI Development Quickly for Supercomputing 2025 | Puget Systems\",\"isPartOf\":{\"@id\":\"https:\/\/www.pugetsystems.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/wp-cdn.pugetsystems.com\/2025\/12\/SC25-AI-Demo-Blog-Post.jpg\",\"datePublished\":\"2025-12-12T22:02:39+00:00\",\"dateModified\":\"2025-12-12T22:02:45+00:00\",\"description\":\"How I used \\\"Vibe Coding\\\" and 25 years of experience to tame a liquid-cooled supercomputer in two weeks.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#primaryimage\",\"url\":\"https:\/\/wp-cdn.pugetsystems.com\/2025\/12\/SC25-AI-Demo-Blog-Post.jpg\",\"contentUrl\":\"https:\/\/wp-cdn.pugetsystems.com\/2025\/12\/SC25-AI-Demo-Blog-Post.jpg\",\"width\":1280,\"height\":720,\"caption\":\"Featured Image for Blog Post about Standing Up an AI Demo for Supercomputing 2025 with a Comino Grando GPU Server in the Background\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.pugetsystems.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"HPC Posts\",\"item\":\"https:\/\/www.pugetsystems.com\/all-hpc\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Standing Up AI Development Quickly for Supercomputing 2025\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.pugetsystems.com\/#website\",\"url\":\"https:\/\/www.pugetsystems.com\/\",\"name\":\"Puget Systems\",\"description\":\"Workstations for creators.\",\"publisher\":{\"@id\":\"https:\/\/www.pugetsystems.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.pugetsystems.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.pugetsystems.com\/#organization\",\"name\":\"Puget Systems\",\"url\":\"https:\/\/www.pugetsystems.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.pugetsystems.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.pugetsystems.com\/wp-content\/uploads\/2022\/08\/Puget-Systems-2020-logo-color-full.png\",\"contentUrl\":\"https:\/\/www.pugetsystems.com\/wp-content\/uploads\/2022\/08\/Puget-Systems-2020-logo-color-full.png\",\"width\":2560,\"height\":363,\"caption\":\"Puget Systems\"},\"image\":{\"@id\":\"https:\/\/www.pugetsystems.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/PugetSystems\",\"https:\/\/x.com\/PugetSystems\",\"https:\/\/www.instagram.com\/pugetsystems\/\",\"https:\/\/www.linkedin.com\/company\/puget-systems\",\"https:\/\/www.youtube.com\/user\/pugetsys\",\"https:\/\/en.wikipedia.org\/wiki\/Puget_Systems\"],\"telephone\":\"(425) 458-0273\",\"legalName\":\"Puget Sound Systems, Inc.\",\"foundingDate\":\"2000-12-01\",\"duns\":\"128267585\",\"naics\":\"334111\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Standing Up AI Development Quickly for Supercomputing 2025 | Puget Systems","description":"How I used \"Vibe Coding\" and 25 years of experience to tame a liquid-cooled supercomputer in two weeks.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/","og_locale":"en_US","og_type":"article","og_title":"Standing Up AI Development Quickly for Supercomputing 2025","og_description":"How I used \"Vibe Coding\" and 25 years of experience to tame a liquid-cooled supercomputer in two weeks.","og_url":"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/","og_site_name":"Puget Systems","article_publisher":"https:\/\/www.facebook.com\/PugetSystems","article_modified_time":"2025-12-12T22:02:45+00:00","og_image":[{"width":1280,"height":720,"url":"https:\/\/wp-cdn.pugetsystems.com\/2025\/12\/SC25-AI-Demo-Blog-Post.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_site":"@PugetSystems","twitter_misc":{"Est. reading time":"7 minutes","Written by":"Dustin Moore"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/","url":"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/","name":"Standing Up AI Development Quickly for Supercomputing 2025 | Puget Systems","isPartOf":{"@id":"https:\/\/www.pugetsystems.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#primaryimage"},"image":{"@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#primaryimage"},"thumbnailUrl":"https:\/\/wp-cdn.pugetsystems.com\/2025\/12\/SC25-AI-Demo-Blog-Post.jpg","datePublished":"2025-12-12T22:02:39+00:00","dateModified":"2025-12-12T22:02:45+00:00","description":"How I used \"Vibe Coding\" and 25 years of experience to tame a liquid-cooled supercomputer in two weeks.","breadcrumb":{"@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#primaryimage","url":"https:\/\/wp-cdn.pugetsystems.com\/2025\/12\/SC25-AI-Demo-Blog-Post.jpg","contentUrl":"https:\/\/wp-cdn.pugetsystems.com\/2025\/12\/SC25-AI-Demo-Blog-Post.jpg","width":1280,"height":720,"caption":"Featured Image for Blog Post about Standing Up an AI Demo for Supercomputing 2025 with a Comino Grando GPU Server in the Background"},{"@type":"BreadcrumbList","@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pugetsystems.com\/"},{"@type":"ListItem","position":2,"name":"HPC Posts","item":"https:\/\/www.pugetsystems.com\/all-hpc\/"},{"@type":"ListItem","position":3,"name":"Standing Up AI Development Quickly for Supercomputing 2025"}]},{"@type":"WebSite","@id":"https:\/\/www.pugetsystems.com\/#website","url":"https:\/\/www.pugetsystems.com\/","name":"Puget Systems","description":"Workstations for creators.","publisher":{"@id":"https:\/\/www.pugetsystems.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pugetsystems.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.pugetsystems.com\/#organization","name":"Puget Systems","url":"https:\/\/www.pugetsystems.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.pugetsystems.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.pugetsystems.com\/wp-content\/uploads\/2022\/08\/Puget-Systems-2020-logo-color-full.png","contentUrl":"https:\/\/www.pugetsystems.com\/wp-content\/uploads\/2022\/08\/Puget-Systems-2020-logo-color-full.png","width":2560,"height":363,"caption":"Puget Systems"},"image":{"@id":"https:\/\/www.pugetsystems.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/PugetSystems","https:\/\/x.com\/PugetSystems","https:\/\/www.instagram.com\/pugetsystems\/","https:\/\/www.linkedin.com\/company\/puget-systems","https:\/\/www.youtube.com\/user\/pugetsys","https:\/\/en.wikipedia.org\/wiki\/Puget_Systems"],"telephone":"(425) 458-0273","legalName":"Puget Sound Systems, Inc.","foundingDate":"2000-12-01","duns":"128267585","naics":"334111"}]}},"_links":{"self":[{"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/hpc_posts\/42432","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/hpc_posts"}],"about":[{"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/types\/hpc_post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/users\/255"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/comments?post=42432"}],"version-history":[{"count":2,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/hpc_posts\/42432\/revisions"}],"predecessor-version":[{"id":42456,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/hpc_posts\/42432\/revisions\/42456"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/media\/42455"}],"wp:attachment":[{"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/media?parent=42432"}],"wp:term":[{"taxonomy":"hpc_category","embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/hpc_categories?post=42432"},{"taxonomy":"hpc_tag","embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/hpc_tags?post=42432"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/coauthors?post=42432"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}