In this article, we will see how to fine-tune text to image AI model, Stable Diffusion on our own images.
Fine tuning with textual inversion can be achieved with as few as 3-5 image examples.
We will cover two ways to do this in this article.
- Using Google Colab Notebooks to fine tune Stable Diffusion
- Fine tuning Stable Diffusion using textual Inversion locally
Let’s cover them one by one
Table of Contents
Using Google Colab Notebooks to fine tune Stable Diffusion
The easiest way of course is Google Colab Notebooks. They can be run in your browser and you require any special hardware like GPUs.
- Open this Google Colab Notebook and follow the instructions in the notebook to run it.
- Next, Open the inference Notebook and Run all the cells.
Fine tuning Stable Diffusion using textual Inversion locally
Another way to fine tune Stable Diffusion on your images is using your hardware.
For this, you either need a GPU-enabled machine locally or a GPU-enabled VM in the cloud.
You will also need to have Python 3 or higher installed and should know your way around the command line.
Below are the steps:
Install Python dependencies by running this command
pip install diffusers[training] accelerate transformers
Next, configure the HuggingFace Accelerate environment by running the below command
Download Stable Diffusion weights
First, visit Stable Diffusion page on HuggingFace to accept the license
For the next part, you need HuggingFace access token
Next, authenticate with your token by running below command
Fine tuning can be started using below command
accelerate launch textual_inversion.py \
–pretrained_model_name_or_path=$MODEL_NAME –use_auth_token \
–placeholder_token=”<cat-toy>” –initializer_token=”toy” \
–learning_rate=5.0e-04 –scale_lr \
To generate images with your new fine tuned model, run below command
from torch import autocast
from diffusers import StableDiffusionPipeline
model_id = “path-to-your-trained-model”
pipe = StableDiffusionPipeline.from_pretrained(model_id,torch_dtype=torch.float16).to(“cuda”)
prompt = “A <cat-toy> backpack”
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images