{"id":22714,"date":"2024-01-29T15:22:19","date_gmt":"2024-01-29T23:22:19","guid":{"rendered":"https:\/\/www.pugetsystems.com\/?post_type=hpc_post&#038;p=22714"},"modified":"2024-05-28T12:03:23","modified_gmt":"2024-05-28T19:03:23","slug":"multi-gpu-sd-training","status":"publish","type":"hpc_post","link":"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/","title":{"rendered":"Experiences with Multi-GPU Stable Diffusion Training"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_80 counter-hierarchy ez-toc-counter ez-toc-transparent ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/#Introduction\" >Introduction<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/#Test_Setup\" >Test Setup<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/#Caveats\" >Caveats<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/#Results\" >Results<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/#Closing_Thoughts\" >Closing Thoughts<\/a><\/li><\/ul><\/nav><\/div>\n\n<h2 class=\"wp-block-heading\" id=\"h-introduction\"><span class=\"ez-toc-section\" id=\"Introduction\"><\/span>Introduction<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Not long after my recent experience <a href=\"https:\/\/www.pugetsystems.com\/labs\/articles\/stable-diffusion-lora-training-consumer-gpu-analysis\">training LoRAs using kohya_ss scripts for Stable Diffusion<\/a>, I noticed that a new version was released that claimed, \u201cThe issues in multi-GPU training are fixed.\u201d This made me interested in giving multi-GPU training a shot to see what challenges I might encounter and to determine what kind of performance benefits could be found.<\/p>\n\n\n\n<p>To do Stable Diffusion training, I like to use <a href=\"https:\/\/github.com\/kohya-ss\/sd-scripts\">kohya-ss\/sd-scripts<\/a>, a collection of scripts to streamline the process, supporting an array of training methods, including native fine-tuning, Dreambooth, and LoRA. <a href=\"https:\/\/github.com\/bmaltais\/kohya_ss\">bmaltais\/kohya_ss<\/a> adds a gradio GUI to the scripts, which I find very helpful for navigating the myriad of training options instead of a more manual process of discovering, choosing, and inputting training arguments.<\/p>\n\n\n\n<p>With these tools, I want to investigate whether using multiple GPUs is now a viable option for training. And if it does work, how much time could it save compared to a single GPU?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-test-setup\"><span class=\"ez-toc-section\" id=\"Test_Setup\"><\/span>Test Setup<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>For the most part, I used the same hardware and software configuration <a href=\"https:\/\/www.pugetsystems.com\/labs\/articles\/stable-diffusion-lora-training-professional-gpu-analysis\/#Test_Setup\">we used for the LoRA training analyses<\/a>. However, I have updated the kohya_ss UI to v22.4.1 and pytorch to 2.1.2. For the GPUs, I used two NVIDIA GeForce RTX 4090 Founder\u2019s Edition cards. I also used the same dataset of thirteen 1024&#215;1024 photos, configured for 40 repeats apiece, for a total of 520 steps in a training run. Also, based on our previous LoRA testing results, I used SDPA cross-attention in all tests.<\/p>\n\n\n\n<p>Additionally, per the scripts\u2019 <a href=\"https:\/\/github.com\/bmaltais\/kohya_ss\/releases\/tag\/v22.4.0\">release notes for 22.4.0<\/a>, two new arguments are recommended for multi-GPU training: <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\u201c--ddp_gradient_as_bucket_view and --ddp_bucket_view options are added to sdxl_train.py. Please specify these options for multi-GPU training.\u201d<\/code><\/pre>\n\n\n\n<p>However, I found that &#8211;ddp_bucket_view is not recognized as a valid argument and doesn\u2019t appear anywhere in the code, so I\u2019m not too certain about that statement. On top of that, these arguments were actually added to train_util.py, not sdxl_train.py, and with a nearby comment that states, \u201cTODO move to SDXL training, because it is not supported by SD1\/2\u201d. So, it\u2019s unclear to me whether these arguments are only necessary for distributed SDXL training. Ultimately, I did choose to include the &#8211;ddp_gradient_as_bucket_view argument.<\/p>\n\n\n\n<p>Also note that all of the training results provided are with full bf16 training enabled, as it was required to complete Dreambooth and Finetuning. Otherwise, they would run out of memory.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-caveats\"><span class=\"ez-toc-section\" id=\"Caveats\"><\/span>Caveats<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p><a href=\"https:\/\/github.com\/kohya-ss\/sd-scripts\/issues\/838\">According to kohya-ss<\/a>, \u201cIn multiple GPU training, the number of the images multiplied by GPU count is trained with single step. So it is recommended to use &#8211;max_train_epochs for training same amount as the single GPU training.\u201d<\/p>\n\n\n\n<p>This means that using the same configuration as we would for single GPU training results in twice as many epochs than we have configured. For example, if we expect a single epoch of 1000 steps with a single GPU, then with two GPUs, we would instead get two epochs of 500 steps. This actually doubles the total work because each GPU processes an image during each step, which means the first example is 1000 multiplied by 1 epoch by 1 GPU (1000), while the second is 500 steps by 2 epochs by 2 GPUs (2000). <\/p>\n\n\n\n<p>The simplest way around this is to set max_train_epochs to one, as kohya-ss suggests. Using the example above, this change results in a single epoch with 500 steps. Since each step consists of one image trained per GPU, we can consider these 500 steps equivalent to the 1000 steps of the single GPU training run.&nbsp;In the charts below, I&#8217;ve included results from tests performed with max_train_epochs left unconfigured and set to a maximum of one.<\/p>\n\n\n\n<p>However, having the ability to train over multiple epochs and compare them against each other is incredibly helpful for dialing in the resulting output. Therefore, to use multiple epochs without inflating the step count, the training data could instead be prepared with half as many steps per training image, resulting in the same number of total steps as a dataset prepared for training with a single GPU. <\/p>\n\n\n\n<p>I tried training with distributed training optimizations like DeepSpeed and FSDP. However, I could not complete any training runs despite the variety of training configurations I tested, so there may very well be some performance optimizations left on the table if these options are indeed available for these scripts with the correct configuration.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-results\"><span class=\"ez-toc-section\" id=\"Results\"><\/span>Results<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<script type=\"text\/javascript\">\n    jQuery(document).ready(function(){\n        var slid_to = 0;\n        jQuery('#image-gallery-33231').on('slid.bs.carousel', function(e){\n            slid_to = e.to;\n        });\n        jQuery('#image-gallery-33231LargeCarousel').on('slid.bs.carousel', function(e){\n            slid_to = e.to;\n            jQuery('#image-gallery-33231').carousel(slid_to);\n        });\n\n        jQuery('#image-gallery-33231 .carousel-item img').click(function(){\n            jQuery('#image-gallery-33231LargeCarousel').carousel(slid_to);\n        });\n    });\n<\/script>\n\n<div id=\"image-gallery-33231\" class=\"carousel carousel-dark slide gallery\" data-interval=\"false\">\n\t<div class=\"carousel-indicators\">\n\t\t            <div data-target=\"#image-gallery-33231\" data-slide-to=\"0\" class=\"active\">\n                <img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2024\/01\/Lora-Performance-iterations-128dim-150x150.png\" class=\"carousel-thumbnail\" alt=\"\" \/>            <\/div>\n                        <div data-target=\"#image-gallery-33231\" data-slide-to=\"1\" >\n                <img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Lora-Performance-seconds-128dim-150x150.png\" class=\"carousel-thumbnail\" alt=\"Lora Performance in seconds\" \/>            <\/div>\n            \t<\/div><!-- .carousel-indicators -->\n\t\t<div class=\"carousel-inner\">\n\t\t\t\t\t<div class=\"carousel-item active\">\n\n                \t\t\t\t<img decoding=\"async\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2024\/01\/Lora-Performance-iterations-128dim.png\"\n\t\t\t\t     alt=\"\" class=\"d-block mx-auto h-100\" data-id=\"0\" data-toggle=\"modal\" data-target=\"#image-gallery-33231Modal\" \/>\n                \t\t\t<\/div>\n\t\t\t\t\t\t<div class=\"carousel-item \">\n\n                \t\t\t\t<img decoding=\"async\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Lora-Performance-seconds-128dim.png\"\n\t\t\t\t     alt=\"Lora Performance in seconds\" class=\"d-block mx-auto h-100\" data-id=\"1\" data-toggle=\"modal\" data-target=\"#image-gallery-33231Modal\" \/>\n                \t\t\t<\/div>\n\t\t\t\t<\/div>\n\t<a class=\"carousel-control-prev\" href=\"#image-gallery-33231\" role=\"button\" data-slide=\"prev\">\n\t\t<span class=\"carousel-control-prev-icon\" aria-hidden=\"true\"><\/span>\n\t\t<span class=\"sr-only\">Previous<\/span>\n\t<\/a>\n\t<a class=\"carousel-control-next\" href=\"#image-gallery-33231\" role=\"button\" data-slide=\"next\">\n\t\t<span class=\"carousel-control-next-icon\" aria-hidden=\"true\"><\/span>\n\t\t<span class=\"sr-only\">Next<\/span>\n\t<\/a>\n<\/div>\n<div class=\"gallery-caption\"><\/div>\n\n\n\n\n\n<div class=\"modal fade\" id=\"image-gallery-33231Modal\" tabindex=\"-1\" role=\"dialog\">\n\t<div class=\"modal-dialog modal-xl\" role=\"document\">\n\t\t<div class=\"modal-content\">\n\t\t\t<div class=\"modal-header\">\n\t\t\t\t<h5 class=\"modal-title\">System Image<\/h5>\n\t\t\t\t<button type=\"button\" class=\"close\" data-dismiss=\"modal\" aria-label=\"Close\">\n\t\t\t\t\t<span aria-hidden=\"true\">&times;<\/span>\n\t\t\t\t<\/button>\n\t\t\t<\/div>\n\t\t\t<div class=\"modal-body\">\n\t\t\t\t<div id=\"image-gallery-33231LargeCarousel\" class=\"carousel carousel-dark slide modal-gallery\" data-interval=\"false\">\n\t\t\t\t\t\t\t\t\t<ol class=\"carousel-indicators\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t<li data-target=\"#image-gallery-33231LargeCarousel\" data-slide-to=\"0\" class=\"active\"><\/li>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t<li data-target=\"#image-gallery-33231LargeCarousel\" data-slide-to=\"1\" ><\/li>\n\t\t\t\t\t\t\t\t\t\t\t\t<\/ol>\n\t\t\t\t\t\t\t\t\t<div class=\"carousel-inner\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"carousel-item active\">\n                                <img loading=\"lazy\" decoding=\"async\" width=\"1228\" height=\"1041\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2024\/01\/Lora-Performance-iterations-128dim.png\" class=\"d-block mx-auto h-100\" alt=\"\" data-id=\"0\" data-toggle=\"modal\" data-target=\"#image-gallery-33231Modal\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"full-res-image-wrapper text-center\">\n\t\t\t\t\t\t\t\t\t<a class=\"btn btn-light btn-lg\" href=\"https:\/\/wp-cdn.pugetsystems.com\/2024\/01\/Lora-Performance-iterations-128dim.png\" target=\"_blank\">Open Full Resolution <i class=\"fas fa-external-link-alt\"><\/i><\/a>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"carousel-item \">\n                                <img loading=\"lazy\" decoding=\"async\" width=\"1228\" height=\"1042\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Lora-Performance-seconds-128dim.png\" class=\"d-block mx-auto h-100\" alt=\"Lora Performance in seconds\" data-id=\"1\" data-toggle=\"modal\" data-target=\"#image-gallery-33231Modal\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"full-res-image-wrapper text-center\">\n\t\t\t\t\t\t\t\t\t<a class=\"btn btn-light btn-lg\" href=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Lora-Performance-seconds-128dim.png\" target=\"_blank\">Open Full Resolution <i class=\"fas fa-external-link-alt\"><\/i><\/a>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<a class=\"carousel-control-prev\" href=\"#image-gallery-33231LargeCarousel\" role=\"button\" data-slide=\"prev\">\n\t\t\t\t\t\t<span class=\"carousel-control-prev-icon\" aria-hidden=\"true\"><\/span>\n\t\t\t\t\t\t<span class=\"sr-only\">Previous<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t\t<a class=\"carousel-control-next\" href=\"#image-gallery-33231LargeCarousel\" role=\"button\" data-slide=\"next\">\n\t\t\t\t\t\t<span class=\"carousel-control-next-icon\" aria-hidden=\"true\"><\/span>\n\t\t\t\t\t\t<span class=\"sr-only\">Next<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t<\/div><!-- .modal-body -->\n\t\t<\/div>\n\t<\/div>\n<\/div>\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>To start off with, we looked at how a second GPU affects performance for training a LoRA with 128 dimensions. Note that we have two charts above, one for iterations per second and another for the total training time. In addition to the single-GPU results, the training time chart has two multi-GPU results to show the difference in total training time between training runs with the maximum epochs uncapped and the maximum epochs limited to one.<\/p>\n\n\n\n<p>It looks like a performance decrease if we only consider the raw iterations per second or the duration of a training run without epochs capped. However, once we set the maximum epochs to one, we can see the performance benefit of training with two GPUs. Although each iteration takes ~23% longer, because we are processing two training images simultaneously, we ultimately complete the training run ~36% faster than with a single GPU.<\/p>\n\n\n\n<script type=\"text\/javascript\">\n    jQuery(document).ready(function(){\n        var slid_to = 0;\n        jQuery('#image-gallery-35327').on('slid.bs.carousel', function(e){\n            slid_to = e.to;\n        });\n        jQuery('#image-gallery-35327LargeCarousel').on('slid.bs.carousel', function(e){\n            slid_to = e.to;\n            jQuery('#image-gallery-35327').carousel(slid_to);\n        });\n\n        jQuery('#image-gallery-35327 .carousel-item img').click(function(){\n            jQuery('#image-gallery-35327LargeCarousel').carousel(slid_to);\n        });\n    });\n<\/script>\n\n<div id=\"image-gallery-35327\" class=\"carousel carousel-dark slide gallery\" data-interval=\"false\">\n\t<div class=\"carousel-indicators\">\n\t\t            <div data-target=\"#image-gallery-35327\" data-slide-to=\"0\" class=\"active\">\n                <img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Dreambooth-Performance-iterations-150x150.png\" class=\"carousel-thumbnail\" alt=\"Dreambooth Performance in iterations per second\" \/>            <\/div>\n                        <div data-target=\"#image-gallery-35327\" data-slide-to=\"1\" >\n                <img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Dreambooth-Performance-seconds-150x150.png\" class=\"carousel-thumbnail\" alt=\"Dreambooth Performance in seconds\" \/>            <\/div>\n            \t<\/div><!-- .carousel-indicators -->\n\t\t<div class=\"carousel-inner\">\n\t\t\t\t\t<div class=\"carousel-item active\">\n\n                \t\t\t\t<img decoding=\"async\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Dreambooth-Performance-iterations.png\"\n\t\t\t\t     alt=\"Dreambooth Performance in iterations per second\" class=\"d-block mx-auto h-100\" data-id=\"0\" data-toggle=\"modal\" data-target=\"#image-gallery-35327Modal\" \/>\n                \t\t\t<\/div>\n\t\t\t\t\t\t<div class=\"carousel-item \">\n\n                \t\t\t\t<img decoding=\"async\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Dreambooth-Performance-seconds.png\"\n\t\t\t\t     alt=\"Dreambooth Performance in seconds\" class=\"d-block mx-auto h-100\" data-id=\"1\" data-toggle=\"modal\" data-target=\"#image-gallery-35327Modal\" \/>\n                \t\t\t<\/div>\n\t\t\t\t<\/div>\n\t<a class=\"carousel-control-prev\" href=\"#image-gallery-35327\" role=\"button\" data-slide=\"prev\">\n\t\t<span class=\"carousel-control-prev-icon\" aria-hidden=\"true\"><\/span>\n\t\t<span class=\"sr-only\">Previous<\/span>\n\t<\/a>\n\t<a class=\"carousel-control-next\" href=\"#image-gallery-35327\" role=\"button\" data-slide=\"next\">\n\t\t<span class=\"carousel-control-next-icon\" aria-hidden=\"true\"><\/span>\n\t\t<span class=\"sr-only\">Next<\/span>\n\t<\/a>\n<\/div>\n<div class=\"gallery-caption\"><\/div>\n\n\n\n\n\n<div class=\"modal fade\" id=\"image-gallery-35327Modal\" tabindex=\"-1\" role=\"dialog\">\n\t<div class=\"modal-dialog modal-xl\" role=\"document\">\n\t\t<div class=\"modal-content\">\n\t\t\t<div class=\"modal-header\">\n\t\t\t\t<h5 class=\"modal-title\">System Image<\/h5>\n\t\t\t\t<button type=\"button\" class=\"close\" data-dismiss=\"modal\" aria-label=\"Close\">\n\t\t\t\t\t<span aria-hidden=\"true\">&times;<\/span>\n\t\t\t\t<\/button>\n\t\t\t<\/div>\n\t\t\t<div class=\"modal-body\">\n\t\t\t\t<div id=\"image-gallery-35327LargeCarousel\" class=\"carousel carousel-dark slide modal-gallery\" data-interval=\"false\">\n\t\t\t\t\t\t\t\t\t<ol class=\"carousel-indicators\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t<li data-target=\"#image-gallery-35327LargeCarousel\" data-slide-to=\"0\" class=\"active\"><\/li>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t<li data-target=\"#image-gallery-35327LargeCarousel\" data-slide-to=\"1\" ><\/li>\n\t\t\t\t\t\t\t\t\t\t\t\t<\/ol>\n\t\t\t\t\t\t\t\t\t<div class=\"carousel-inner\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"carousel-item active\">\n                                <img loading=\"lazy\" decoding=\"async\" width=\"1228\" height=\"1042\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Dreambooth-Performance-iterations.png\" class=\"d-block mx-auto h-100\" alt=\"Dreambooth Performance in iterations per second\" data-id=\"0\" data-toggle=\"modal\" data-target=\"#image-gallery-35327Modal\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"full-res-image-wrapper text-center\">\n\t\t\t\t\t\t\t\t\t<a class=\"btn btn-light btn-lg\" href=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Dreambooth-Performance-iterations.png\" target=\"_blank\">Open Full Resolution <i class=\"fas fa-external-link-alt\"><\/i><\/a>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"carousel-item \">\n                                <img loading=\"lazy\" decoding=\"async\" width=\"1228\" height=\"1042\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Dreambooth-Performance-seconds.png\" class=\"d-block mx-auto h-100\" alt=\"Dreambooth Performance in seconds\" data-id=\"1\" data-toggle=\"modal\" data-target=\"#image-gallery-35327Modal\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"full-res-image-wrapper text-center\">\n\t\t\t\t\t\t\t\t\t<a class=\"btn btn-light btn-lg\" href=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Dreambooth-Performance-seconds.png\" target=\"_blank\">Open Full Resolution <i class=\"fas fa-external-link-alt\"><\/i><\/a>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<a class=\"carousel-control-prev\" href=\"#image-gallery-35327LargeCarousel\" role=\"button\" data-slide=\"prev\">\n\t\t\t\t\t\t<span class=\"carousel-control-prev-icon\" aria-hidden=\"true\"><\/span>\n\t\t\t\t\t\t<span class=\"sr-only\">Previous<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t\t<a class=\"carousel-control-next\" href=\"#image-gallery-35327LargeCarousel\" role=\"button\" data-slide=\"next\">\n\t\t\t\t\t\t<span class=\"carousel-control-next-icon\" aria-hidden=\"true\"><\/span>\n\t\t\t\t\t\t<span class=\"sr-only\">Next<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t<\/div><!-- .modal-body -->\n\t\t<\/div>\n\t<\/div>\n<\/div>\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>Unlike the LoRA results, where we saw a moderate performance drop when introducing another GPU, the Dreambooth results show a stark decrease in performance, with the distributed training running at only about one-third of the speed of a single card. Therefore, even after limiting the multi-GPU training run to a single epoch, it&#8217;s still insufficient to outperform a single GPU.<\/p>\n\n\n\n<script type=\"text\/javascript\">\n    jQuery(document).ready(function(){\n        var slid_to = 0;\n        jQuery('#image-gallery-16558').on('slid.bs.carousel', function(e){\n            slid_to = e.to;\n        });\n        jQuery('#image-gallery-16558LargeCarousel').on('slid.bs.carousel', function(e){\n            slid_to = e.to;\n            jQuery('#image-gallery-16558').carousel(slid_to);\n        });\n\n        jQuery('#image-gallery-16558 .carousel-item img').click(function(){\n            jQuery('#image-gallery-16558LargeCarousel').carousel(slid_to);\n        });\n    });\n<\/script>\n\n<div id=\"image-gallery-16558\" class=\"carousel carousel-dark slide gallery\" data-interval=\"false\">\n\t<div class=\"carousel-indicators\">\n\t\t            <div data-target=\"#image-gallery-16558\" data-slide-to=\"0\" class=\"active\">\n                <img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Finetune-Performance-iterations-150x150.png\" class=\"carousel-thumbnail\" alt=\"Finetune Performance in iterations per second\" \/>            <\/div>\n                        <div data-target=\"#image-gallery-16558\" data-slide-to=\"1\" >\n                <img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Finetune-Performance-seconds-150x150.png\" class=\"carousel-thumbnail\" alt=\"Finetune Performance in seconds\" \/>            <\/div>\n            \t<\/div><!-- .carousel-indicators -->\n\t\t<div class=\"carousel-inner\">\n\t\t\t\t\t<div class=\"carousel-item active\">\n\n                \t\t\t\t<img decoding=\"async\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Finetune-Performance-iterations.png\"\n\t\t\t\t     alt=\"Finetune Performance in iterations per second\" class=\"d-block mx-auto h-100\" data-id=\"0\" data-toggle=\"modal\" data-target=\"#image-gallery-16558Modal\" \/>\n                \t\t\t<\/div>\n\t\t\t\t\t\t<div class=\"carousel-item \">\n\n                \t\t\t\t<img decoding=\"async\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Finetune-Performance-seconds.png\"\n\t\t\t\t     alt=\"Finetune Performance in seconds\" class=\"d-block mx-auto h-100\" data-id=\"1\" data-toggle=\"modal\" data-target=\"#image-gallery-16558Modal\" \/>\n                \t\t\t<\/div>\n\t\t\t\t<\/div>\n\t<a class=\"carousel-control-prev\" href=\"#image-gallery-16558\" role=\"button\" data-slide=\"prev\">\n\t\t<span class=\"carousel-control-prev-icon\" aria-hidden=\"true\"><\/span>\n\t\t<span class=\"sr-only\">Previous<\/span>\n\t<\/a>\n\t<a class=\"carousel-control-next\" href=\"#image-gallery-16558\" role=\"button\" data-slide=\"next\">\n\t\t<span class=\"carousel-control-next-icon\" aria-hidden=\"true\"><\/span>\n\t\t<span class=\"sr-only\">Next<\/span>\n\t<\/a>\n<\/div>\n<div class=\"gallery-caption\"><\/div>\n\n\n\n\n\n<div class=\"modal fade\" id=\"image-gallery-16558Modal\" tabindex=\"-1\" role=\"dialog\">\n\t<div class=\"modal-dialog modal-xl\" role=\"document\">\n\t\t<div class=\"modal-content\">\n\t\t\t<div class=\"modal-header\">\n\t\t\t\t<h5 class=\"modal-title\">System Image<\/h5>\n\t\t\t\t<button type=\"button\" class=\"close\" data-dismiss=\"modal\" aria-label=\"Close\">\n\t\t\t\t\t<span aria-hidden=\"true\">&times;<\/span>\n\t\t\t\t<\/button>\n\t\t\t<\/div>\n\t\t\t<div class=\"modal-body\">\n\t\t\t\t<div id=\"image-gallery-16558LargeCarousel\" class=\"carousel carousel-dark slide modal-gallery\" data-interval=\"false\">\n\t\t\t\t\t\t\t\t\t<ol class=\"carousel-indicators\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t<li data-target=\"#image-gallery-16558LargeCarousel\" data-slide-to=\"0\" class=\"active\"><\/li>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t<li data-target=\"#image-gallery-16558LargeCarousel\" data-slide-to=\"1\" ><\/li>\n\t\t\t\t\t\t\t\t\t\t\t\t<\/ol>\n\t\t\t\t\t\t\t\t\t<div class=\"carousel-inner\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"carousel-item active\">\n                                <img loading=\"lazy\" decoding=\"async\" width=\"1228\" height=\"1042\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Finetune-Performance-iterations.png\" class=\"d-block mx-auto h-100\" alt=\"Finetune Performance in iterations per second\" data-id=\"0\" data-toggle=\"modal\" data-target=\"#image-gallery-16558Modal\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"full-res-image-wrapper text-center\">\n\t\t\t\t\t\t\t\t\t<a class=\"btn btn-light btn-lg\" href=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Finetune-Performance-iterations.png\" target=\"_blank\">Open Full Resolution <i class=\"fas fa-external-link-alt\"><\/i><\/a>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"carousel-item \">\n                                <img loading=\"lazy\" decoding=\"async\" width=\"1228\" height=\"1042\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Finetune-Performance-seconds.png\" class=\"d-block mx-auto h-100\" alt=\"Finetune Performance in seconds\" data-id=\"1\" data-toggle=\"modal\" data-target=\"#image-gallery-16558Modal\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"full-res-image-wrapper text-center\">\n\t\t\t\t\t\t\t\t\t<a class=\"btn btn-light btn-lg\" href=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/Finetune-Performance-seconds.png\" target=\"_blank\">Open Full Resolution <i class=\"fas fa-external-link-alt\"><\/i><\/a>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<a class=\"carousel-control-prev\" href=\"#image-gallery-16558LargeCarousel\" role=\"button\" data-slide=\"prev\">\n\t\t\t\t\t\t<span class=\"carousel-control-prev-icon\" aria-hidden=\"true\"><\/span>\n\t\t\t\t\t\t<span class=\"sr-only\">Previous<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t\t<a class=\"carousel-control-next\" href=\"#image-gallery-16558LargeCarousel\" role=\"button\" data-slide=\"next\">\n\t\t\t\t\t\t<span class=\"carousel-control-next-icon\" aria-hidden=\"true\"><\/span>\n\t\t\t\t\t\t<span class=\"sr-only\">Next<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t<\/div><!-- .modal-body -->\n\t\t<\/div>\n\t<\/div>\n<\/div>\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>The finetuning results essentially mirror what I found with Dreambooth. Due to the severe reduction in iterations per second, the simultaneous completion of training steps does not lead to a training time reduction compared to the single GPU training run.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-closing-thoughts\"><span class=\"ez-toc-section\" id=\"Closing_Thoughts\"><\/span>Closing Thoughts<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Performance benefits can be achieved when training Stable Diffusion with kohya\u2019s scripts and multiple GPUs, but it isn\u2019t as simple as dropping in a second GPU and kicking off a training run. Beyond configuring Accelerate to use multiple GPUs, we also need to consider how to account for the multiplication of epochs, either by limiting the max epochs to 1 or preparing our dataset with fewer repeats per image.<\/p>\n\n\n\n<p>Furthermore, the only performance benefits I could achieve were with LoRA training, and both Dreambooth and Finetuning had significantly reduced performance. At this point, I\u2019m unsure if this is due to my configuration lacking multi-GPU optimizations such as DeepSpeed or if it is possibly a result of issues with the scripts themselves.<\/p>\n\n\n\n<p>If any readers have successfully utilized DeepSpeed or other distributed training optimizations with kohya\u2019s scripts, I\u2019m eager to hear from you, so please let me know in the comments if you have any advice!<\/p>\n\n\n<div class=\"wp-bootstrap-blocks-row row puget-icon-section\">\n\t\n\n<div class=\"col-12 col-md-6\">\n\t\t\t\n\n<div style=\"height:10px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image size-thumbnail is-resized text-center\"><a href=\"https:\/\/www.pugetsystems.com\/solutions\/scientific-computing-workstations\/machine-learning-ai\/\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/12\/computer-icon.png\" alt=\"Tower Computer Icon in Puget Systems Colors\" class=\"wp-image-12659\" style=\"width:113px;height:113px\" title=\"\"\/><\/a><\/figure>\n\n\n\n<h4 class=\"wp-block-heading has-text-align-center\" id=\"h-looking-for-an-ai-workstation-or-server\">Looking for an AI workstation or server?<\/h4>\n\n\n\n<p class=\"has-text-align-center\">We build computers tailor-made for your workflow.&nbsp;<\/p>\n\n\n<div class=\"wp-bootstrap-blocks-button text-center\">\n\t<a\n\t\thref=\"https:\/\/www.pugetsystems.com\/solutions\/scientific-computing-workstations\/machine-learning-ai\/\"\n\t\t\t\t\t\tclass=\"btn btn-primary\"\n\t>\n\t\tConfigure a System\t<\/a>\n<\/div>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\t<\/div>\n\n\n\n<div class=\"col-12 col-md-6\">\n\t\t\t\n\n<div style=\"height:10px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image size-thumbnail is-resized text-center\"><a href=\"\/contact-expert\/\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/wp-cdn.pugetsystems.com\/2022\/08\/talking-icon.png\" alt=\"Talking Head Icon in Puget Systems Colors\" class=\"wp-image-12657\" style=\"width:113px;height:113px\" title=\"\"\/><\/a><\/figure>\n\n\n\n<h4 class=\"wp-block-heading has-text-align-center\" id=\"h-don-t-know-where-to-start-we-can-help\">Don&#8217;t know where to start?<br>We can help!<\/h4>\n\n\n\n<p class=\"has-text-align-center\">Get in touch with one of our technical consultants today.<\/p>\n\n\n<div class=\"wp-bootstrap-blocks-button text-center\">\n\t<a\n\t\thref=\"\/contact-expert\/\"\n\t\t\t\t\t\tclass=\"btn btn-primary\"\n\t>\n\t\tTalk to an Expert\t<\/a>\n<\/div>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\t<\/div>\n\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h3 class=\"wp-block-heading has-text-align-center\" id=\"h-related-content\">Related Content<\/h3>\n\n\n \n<div class=\"related-content\">\n\t<ul class=\"related-content-list\">\n\t\t\t\t\t\t<li class=\"related-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/\" title=\"Standing Up AI Development Quickly for Supercomputing 2025\">Standing Up AI Development Quickly for Supercomputing 2025<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"related-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/exploring-hybrid-cpu-gpu-llm-inference\/\" title=\"Exploring Hybrid CPU\/GPU LLM Inference\">Exploring Hybrid CPU\/GPU LLM Inference<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"related-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/whats-the-deal-with-npus\/\" title=\"What&#8217;s the deal with NPUs?\">What&#8217;s the deal with NPUs?<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"related-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/local-alternatives-to-cloud-ai-services\/\" title=\"Local alternatives to Cloud AI services\">Local alternatives to Cloud AI services<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t<\/ul>\n\t \n\t<a class=\"view-term-link\" href=\"\/all_articles?filter=machine-learning\">View\n\t\tAll Related Content<\/a>\n\t<\/div><\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h3 class=\"wp-block-heading has-text-align-center\" id=\"h-latest-content\">Latest Content<\/h3>\n\n\n \n<div class=\"latest-content\">\n\t<ul class=\"latest-content-list\">\n\t\t\t\t\t\t<li class=\"latest-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/standing-up-ai-development-quickly-for-supercomputing-2025\/\" title=\"Standing Up AI Development Quickly for Supercomputing 2025\">Standing Up AI Development Quickly for Supercomputing 2025<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"latest-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/exploring-hybrid-cpu-gpu-llm-inference\/\" title=\"Exploring Hybrid CPU\/GPU LLM Inference\">Exploring Hybrid CPU\/GPU LLM Inference<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"latest-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/whats-the-deal-with-npus\/\" title=\"What&#8217;s the deal with NPUs?\">What&#8217;s the deal with NPUs?<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<li class=\"latest-content-list-item\">\n\t\t\t\t\t<a href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/local-alternatives-to-cloud-ai-services\/\" title=\"Local alternatives to Cloud AI services\">Local alternatives to Cloud AI services<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t<\/ul>\n\t \n\t\t<a href=\"https:\/\/www.pugetsystems.com\/all-hpc\/\" class=\"view-posts-link\">View All<\/a>\n\t<\/div><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Results and thoughts with regard to testing a variety of Stable Diffusion training methods using multiple GPUs.<\/p>\n","protected":false},"author":166,"featured_media":22914,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","meta":{"_acf_changed":false,"content-type":"","classic-editor-remember":"","legacy_id":"","redirect_url":[],"expire_date":"","alert_message":"","alert_link":[],"configure_ids":"","system_grid_title":"","system_grid_ids":"","footnotes":""},"hpc_categories":[8879,8883],"hpc_tags":[8765,8779,9243],"coauthors":[9063],"class_list":["post-22714","hpc_post","type-hpc_post","status-publish","has-post-thumbnail","hentry","hpc_category-hardware","hpc_category-machine-learning","hpc_tag-machine-learning","hpc_tag-nvidia","hpc_tag-stable-diffusion"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.7 (Yoast SEO v26.7) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Experiences with Multi-GPU Stable Diffusion Training | Puget Systems<\/title>\n<meta name=\"description\" content=\"Results and thoughts with regard to testing a variety of Stable Diffusion training methods using multiple GPUs.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Experiences with Multi-GPU Stable Diffusion Training\" \/>\n<meta property=\"og:description\" content=\"Results and thoughts with regard to testing a variety of Stable Diffusion training methods using multiple GPUs.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/\" \/>\n<meta property=\"og:site_name\" content=\"Puget Systems\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/PugetSystems\" \/>\n<meta property=\"article:modified_time\" content=\"2024-05-28T19:03:23+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.pugetsystems.com\/wp-content\/uploads\/2024\/01\/SDXL-Dual-GPU-1024x576.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"576\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@PugetSystems\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"6 minutes\" \/>\n\t<meta name=\"twitter:label2\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data2\" content=\"Jon Allman\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/\",\"url\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/\",\"name\":\"Experiences with Multi-GPU Stable Diffusion Training | Puget Systems\",\"isPartOf\":{\"@id\":\"https:\/\/www.pugetsystems.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/wp-cdn.pugetsystems.com\/2024\/01\/SDXL-Dual-GPU.png\",\"datePublished\":\"2024-01-29T23:22:19+00:00\",\"dateModified\":\"2024-05-28T19:03:23+00:00\",\"description\":\"Results and thoughts with regard to testing a variety of Stable Diffusion training methods using multiple GPUs.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/#primaryimage\",\"url\":\"https:\/\/wp-cdn.pugetsystems.com\/2024\/01\/SDXL-Dual-GPU.png\",\"contentUrl\":\"https:\/\/wp-cdn.pugetsystems.com\/2024\/01\/SDXL-Dual-GPU.png\",\"width\":1920,\"height\":1080},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.pugetsystems.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"HPC Posts\",\"item\":\"https:\/\/www.pugetsystems.com\/all-hpc\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Experiences with Multi-GPU Stable Diffusion Training\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.pugetsystems.com\/#website\",\"url\":\"https:\/\/www.pugetsystems.com\/\",\"name\":\"Puget Systems\",\"description\":\"Workstations for creators.\",\"publisher\":{\"@id\":\"https:\/\/www.pugetsystems.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.pugetsystems.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.pugetsystems.com\/#organization\",\"name\":\"Puget Systems\",\"url\":\"https:\/\/www.pugetsystems.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.pugetsystems.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.pugetsystems.com\/wp-content\/uploads\/2022\/08\/Puget-Systems-2020-logo-color-full.png\",\"contentUrl\":\"https:\/\/www.pugetsystems.com\/wp-content\/uploads\/2022\/08\/Puget-Systems-2020-logo-color-full.png\",\"width\":2560,\"height\":363,\"caption\":\"Puget Systems\"},\"image\":{\"@id\":\"https:\/\/www.pugetsystems.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/PugetSystems\",\"https:\/\/x.com\/PugetSystems\",\"https:\/\/www.instagram.com\/pugetsystems\/\",\"https:\/\/www.linkedin.com\/company\/puget-systems\",\"https:\/\/www.youtube.com\/user\/pugetsys\",\"https:\/\/en.wikipedia.org\/wiki\/Puget_Systems\"],\"telephone\":\"(425) 458-0273\",\"legalName\":\"Puget Sound Systems, Inc.\",\"foundingDate\":\"2000-12-01\",\"duns\":\"128267585\",\"naics\":\"334111\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Experiences with Multi-GPU Stable Diffusion Training | Puget Systems","description":"Results and thoughts with regard to testing a variety of Stable Diffusion training methods using multiple GPUs.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/","og_locale":"en_US","og_type":"article","og_title":"Experiences with Multi-GPU Stable Diffusion Training","og_description":"Results and thoughts with regard to testing a variety of Stable Diffusion training methods using multiple GPUs.","og_url":"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/","og_site_name":"Puget Systems","article_publisher":"https:\/\/www.facebook.com\/PugetSystems","article_modified_time":"2024-05-28T19:03:23+00:00","og_image":[{"width":1024,"height":576,"url":"https:\/\/www.pugetsystems.com\/wp-content\/uploads\/2024\/01\/SDXL-Dual-GPU-1024x576.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_site":"@PugetSystems","twitter_misc":{"Est. reading time":"6 minutes","Written by":"Jon Allman"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/","url":"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/","name":"Experiences with Multi-GPU Stable Diffusion Training | Puget Systems","isPartOf":{"@id":"https:\/\/www.pugetsystems.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/#primaryimage"},"image":{"@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/#primaryimage"},"thumbnailUrl":"https:\/\/wp-cdn.pugetsystems.com\/2024\/01\/SDXL-Dual-GPU.png","datePublished":"2024-01-29T23:22:19+00:00","dateModified":"2024-05-28T19:03:23+00:00","description":"Results and thoughts with regard to testing a variety of Stable Diffusion training methods using multiple GPUs.","breadcrumb":{"@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/#primaryimage","url":"https:\/\/wp-cdn.pugetsystems.com\/2024\/01\/SDXL-Dual-GPU.png","contentUrl":"https:\/\/wp-cdn.pugetsystems.com\/2024\/01\/SDXL-Dual-GPU.png","width":1920,"height":1080},{"@type":"BreadcrumbList","@id":"https:\/\/www.pugetsystems.com\/labs\/hpc\/multi-gpu-sd-training\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pugetsystems.com\/"},{"@type":"ListItem","position":2,"name":"HPC Posts","item":"https:\/\/www.pugetsystems.com\/all-hpc\/"},{"@type":"ListItem","position":3,"name":"Experiences with Multi-GPU Stable Diffusion Training"}]},{"@type":"WebSite","@id":"https:\/\/www.pugetsystems.com\/#website","url":"https:\/\/www.pugetsystems.com\/","name":"Puget Systems","description":"Workstations for creators.","publisher":{"@id":"https:\/\/www.pugetsystems.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pugetsystems.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.pugetsystems.com\/#organization","name":"Puget Systems","url":"https:\/\/www.pugetsystems.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.pugetsystems.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.pugetsystems.com\/wp-content\/uploads\/2022\/08\/Puget-Systems-2020-logo-color-full.png","contentUrl":"https:\/\/www.pugetsystems.com\/wp-content\/uploads\/2022\/08\/Puget-Systems-2020-logo-color-full.png","width":2560,"height":363,"caption":"Puget Systems"},"image":{"@id":"https:\/\/www.pugetsystems.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/PugetSystems","https:\/\/x.com\/PugetSystems","https:\/\/www.instagram.com\/pugetsystems\/","https:\/\/www.linkedin.com\/company\/puget-systems","https:\/\/www.youtube.com\/user\/pugetsys","https:\/\/en.wikipedia.org\/wiki\/Puget_Systems"],"telephone":"(425) 458-0273","legalName":"Puget Sound Systems, Inc.","foundingDate":"2000-12-01","duns":"128267585","naics":"334111"}]}},"_links":{"self":[{"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/hpc_posts\/22714","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/hpc_posts"}],"about":[{"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/types\/hpc_post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/users\/166"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/comments?post=22714"}],"version-history":[{"count":0,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/hpc_posts\/22714\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/media\/22914"}],"wp:attachment":[{"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/media?parent=22714"}],"wp:term":[{"taxonomy":"hpc_category","embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/hpc_categories?post=22714"},{"taxonomy":"hpc_tag","embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/hpc_tags?post=22714"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.pugetsystems.com\/wp-json\/wp\/v2\/coauthors?post=22714"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}