How to Build Custom LoRA for Stable Diffusion (2024 Guide)

Learn how to build custom LoRA for Stable Diffusion. This technical guide covers dataset curation, Kohya_ss configuration, and optimization strategies for Indian AI founders and developers.

Stable Diffusion has revolutionized local image generation, but generic models often fail to capture specific faces, unique artistic styles, or niche object details. This is where Low-Rank Adaptation (LoRA) comes in. LoRAs are small, portable adapter files (usually 10MB to 200MB) that "patch" a base model like SDXL or SD 1.5 to output specific concepts without the massive overhead of a full checkpoint fine-tune.

For Indian AI founders and developers, mastering LoRA training is the bridge between using a general-purpose tool and building a vertically integrated product—whether it’s for localized fashion design, Indian architectural visualization, or regional character consistency in gaming.

Understanding the LoRA Architecture

Before diving into the "how-to," it is essential to understand why LoRAs are the industry standard for custom training. Traditional fine-tuning modifies every weight in a neural network, requiring massive VRAM and resulting in multi-gigabyte files.

LoRA works by adding a small number of new weights to the cross-attention layers of the U-Net. During training, the original weights are "frozen," and only these small matrices are updated. This allows the model to learn new patterns—such as the intricacies of a specific silk weave or a person's facial features—while remaining extremely lightweight and easy to share.

Hardware and Software Requirements

To build a custom LoRA, you need a high-VRAM GPU. While you can use cloud services like RunPod or Google Colab, local training is often preferred for privacy and iterative testing.

GPU: Minimum 8GB VRAM (NVIDIA RTX 3060/4060). For SDXL LoRAs, 16GB+ (RTX 3090/4090) is highly recommended.
Storage: 50GB+ of SSD space.
Operating System: Windows 10/11 (with WSL2) or Ubuntu.
Software Framework: Kohya_ss is the gold standard GUI for training. Alternatively, trainers like OneTrainer or the Flux-specific trainers are gaining popularity.

Step 1: Dataset Preparation (The Most Critical Step)

The quality of your LoRA depends 90% on your dataset and 10% on your hyperparameters. To train a high-quality concept, you need:

1. Image Selection: 20 to 50 high-resolution images. For a person, include various angles (front, profile, three-quarters), different lighting, and diverse expressions. Avoid "noisy" backgrounds if you only want the model to learn the subject.
2. Preprocessing: Crop images to the native resolution of your base model (512x512 for SD 1.5, 1024x1024 for SDXL).
3. Captioning (Tagging): Every image needs a corresponding `.txt` file with the same name. Use a tool like Kohya_ss (WD14 Captioner) or LLaVA to auto-generate tags.

The Activation Word: Choose a unique word (e.g., `inkStyle_v1`) that clarifies the concept.
Tokenization Strategy: If you are training a specific person, tag features you want to keep flexible (e.g., "blue shirt") and leave out the features you want the LoRA to "absorb" into the activation word.

Step 2: Setting Up the Training Environment

We recommend using Kohya_ss. It provides a web-based interface for the underlying `sd-scripts` used by professional AI researchers.

1. Clone the repository: `git clone https://github.com/bmaltais/kohya_ss.git`
2. Run the setup script (`setup.bat` or `setup.sh`).
3. Launch the GUI and navigate to the "LoRA" tab.

Step 3: Configuring Hyperparameters

This is where most beginners get stuck. Here are the "Golden Settings" for a standard LoRA training session:

Mixed Precision: `bf16` (if using 30-series or 40-series cards) or `fp16`.
Optimizer: `AdamW8bit` is efficient, but `Prodigy` or `Adafactor` are better for "set and forget" learning rates.
Learning Rate: If using AdamW, try `1e-4` for the Unet and `5e-5` for the Text Encoder.
Network Rank (Dimension): This determines the "capacity" of the LoRA.
Rank 16/Alpha 8: Good for simple styles.
Rank 64/Alpha 32: The sweet spot for faces and complex objects.
Rank 128: Usually overkill and can lead to over-fitting.
Batch Size: Set to `1` or `2` depending on VRAM.
Epochs: Usually 10 to 20. It is better to use "Save every N epochs" so you can test which version is best later.

Step 4: Training and Sanity Checks

Once configured, click "Start Training." Keep an eye on the Loss Graph. While "Loss" isn't a perfect indicator of image quality, a steady decline followed by a plateau usually indicates the model has learned the dataset.

Wait for the `.safetensors` file to be generated in your output folder.

Step 5: Testing and Inference (XYZ Plots)

Do not assume your LoRA is perfect on the first try. Move your LoRA file to the `models/Lora` folder in Automatic1111 or ComfyUI.

The best way to validate is using an XYZ Plot:
1. Set the X-axis to "LoRA Weight" (0.1 to 1.0).
2. Set the Y-axis to "Prompt" (test different environments).
3. Look for "overfitting." If the images look charred, high-contrast, or refuse to change backgrounds when prompted, your LoRA is too strong. You may need to lower the Rank or reduce the Learning Rate in your next training run.

Common Pitfalls for Custom LoRAs

Overfitting: Training for too many steps makes the LoRA "rigid." It will look exactly like your training images but won't be able to generate the subject in new poses.
Underfitting: The concept isn't recognizable. This usually happens with a Learning Rate that is too low.
Bucket Issues: Always enable "Aspect Ratio Bucketing" in Kohya_ss. This allows the model to learn from non-square images without distorting the subjects.

Use Cases for Indian AI Startups

Custom LoRAs are particularly powerful in the Indian context:

E-commerce: Train LoRAs on specific ethnic wear (Saris, Lehengas) to generate catalog-quality photoshoots at a fraction of the cost.
Entertainment: Create consistent characters for regional web-series or localized comic books.
Branding: Train a LoRA on a brand's specific visual identity, color palette, and logo integration for instant marketing collateral.

FAQ: Building Custom LoRAs

How many images do I need?
For a human face, 15-25 high-quality images are usually sufficient. For a complex art style, you may need 50-100.

What is the difference between LoRA and LyCORIS?
LyCORIS is an umbrella term for different adaptation methods like LoHa or LoKr. They provide more "neurons" for the model to learn but result in larger files. For 95% of users, standard LoRA is the best choice.

Can I train a LoRA on an 8GB GPU?
Yes, by using `AdamW8bit` or `Lion` optimizers and limiting the Rank to 32, you can successfully train SD 1.5 LoRAs on an 8GB card. SDXL may require "Gradient Checkpointing" and `bitsandbytes` optimizations to fit.

Does high resolution matter?
Yes. If you train on 512px images but try to generate at 1024px, the LoRA may introduce blurriness. Always match your training resolution to your intended output.

Apply for AI Grants India

Are you an Indian founder building proprietary models or fine-tuning workflows to solve local problems? AI Grants India provides the resources and community to help you scale your vision. If you are building "AI for Bharat" or global tools from India, apply today at https://aigrants.in/ and join a cohort of world-class developers.