Copy these files to your stable-diffusion to enable text-inversion

Overview

sd-enable-textual-inversion

Copy the files into your stable-diffusion folder to enable text-inversion in the Web UI.

If you experience any problems with webui integration, please create an issue here.

If you are having problems installing or running textual-inversion, see FAQ below. If your problem is not listed, the official repo is here.


How does it work?

textual-inversion - An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion (credit: Tel Aviv University, NVIDIA)

We learn to generate specific concepts, like personal objects or artistic styles, by describing them using new "words" in the embedding space of pre-trained text-to-image models. These can be used in new sentences, just like any other word.

Essentially, this model will take some pictures of an object, style, etc. and learn how to describe it in a way that can be understood by text-to-image models such as Stable Diffusion. This allows you to reference specific things in your prompts, or concepts that are easier to express with pictures rather than words.


How to Use

Before you can do anything with the WebUI, you must first create an embedding file by training the Textual-Inversion model. Alternatively, you can test with one of the pre-made embeddings from here.

In the WebUI, place the embeddings file in the embeddings file upload box. Then you can reference the embedding by using * in your prompt.

Examples from the paper:


How to Train Textual-Inversion

** Training is best done by using the original repo **

WARNING: This is a very memory-intensive model and, as of writing, is not optimized to work with SD. You will need an Nvidia GPU with at least 10GB of VRAM to even get this to train at all on your local device, and a GPU with 20GB+ to train in a reasonable amount of time. If you do not have the system resources, you should use Colab or stick with pretrained embeddings until SD is better supported.

Note that these instructions are for training on your local device, instructions may vary for training in Colab.

You will need 3-5 images of what you want the model to describe. You can use more images, but the paper recommends 5. For the best results, the images should be visually similar, and each image should be cropped to 512x512. Any other sizes will be rescaled (stretched) and may produce strange results.


Step 1:

Place 3-5 images of the object/artstyle/scene/etc. into an empty folder.

Step 2: In Anaconda run

python main.py --base configs/stable-diffusion/v1-finetune.yaml -t --actual_resume models/ldm/text2img-large/model.ckpt -n <name this run> --data_root path/to/image/folder --gpus 1 --init-word <your init word>

--base points the script at the training configuration file

--actual_resume points the script at the Textual-Inversion model

--n gives the training run a name, which will also be used as the output folder name.

--gpus Leave at 1 unless you know what you're doing.

--init-word is a single word the model will start with when looking at your images for the first time. Should be simple, ie: "sculpture", "girl", "mountains"

Step 3:

The model will continue to train until you stop it by entering CTRL+C. The recommended training time is 3000-7000 iterations (global steps). You can see what step the run is on in the progress bar. You can also monitor progress by reviewing the images at logs/<your run name>/images. I recommend sorting that folder by date modified, they tend to get jumbled up otherwise.

Step 4:

Once stopped, you will find a number of embedding files under logs/<your run name>/checkpoints. The one you want is embeddings.pt.

Step 5:

In the WebUI, upload the embedding file you just created. Now, when writing a prompt, you can use * to reference the whatever the embedding file describes.

"A picture of * in the style of Rembrandt"

"A photo of * as a corgi"

"A coffee mug in the style of *"


Pro tips

Reminder: Official Repo here ==> rinongal/textual_inversion

Unofficial fork, more stable on Windows (8/28/22) ==> nicolai256/Stable-textual-inversion_win

  • When using embeddings in your prompts, the authors note that markers (*) are sensitive to puncutation. Avoid using periods or commas directly after *
  • The model tends to converge faster and provide more accurate results when using language such as "a photo of" or "* as a photograph" in your prompts
  • When training Textual-Inversion, the paper says that using more than 5 images leads to less cohesive results. Some users seem to disagree. Try experimenting.
  • When training, more than one init-word can be specified by adding them to the list at initializer_words: ["sculpture", "ice"] in v1-finetune.yaml. Order may matter (unconfirmed)
  • You can train multiple embedding files, then merge them with merge_embeddings.py -sd to reference multiple things. See official repo for more details.

FAQ

Q: How much VRAM does this require, why am I receiving a CUDA Out of Memory Error?

A: This model is very VRAM heavy, with 20GB being the recommended amount. It is possible to run this model on a GPU with <12GB of VRAM, but no guarantee. Try changing size: 512 to size: 448 in v1-finetune.yaml -> data: -> params: for both train: and validation:. If that is not enough, then it's probably best to use a Colab notebook or other GPU hosting service to do your training.


Q: Why am I receiving a "SIGUSR1" error? Why am I receiving an NCNN error? Why am I receiving OSError: cannot open resource?

A: The script main.py was written without Windows in mind.

You will need to open main.py and add the following line after the last import near the top of the script:

os.environ["PL_TORCH_DISTRIBUTED_BACKEND"] = "gloo"

Next, find the following lines near the end of the script. Change SIGUSR1 and SIGUSR2 to SIGTERM:

import signal

signal.signal(signal.SIGUSR1, melk)
signal.signal(signal.SIGUSR2, divein)

Finally, open the file ldm/utils.py and find this line: font = ImageFont.truetype('data/DejaVuSans.ttf', size=size). Comment it out and replace it with this: font = ImageFont.load_default()


Q: Why am I receiving an error about multiple devices detected?

A: Make sure you are using the --gpus 1 argument. If you are still receiving the error, open main.py and find the following lines:

if not cpu:
    ngpu = len(lightning_config.trainer.gpus.strip(",").split(','))
else:
    ngpu = 1

Comment these lines out, then below them add ngpu = 1 (Or whatever # of GPUs you want to use). Make sure that it is at the same indentation level as the line below it.


Q: Why am I receiving an error about if trainer.global_rank == 0:?

A: Open main.py and scroll to the end of the file. On the last few lines, comment out the line where it says if trainer.global_rank == 0: and the line below it.


Q: Why am I receiving errors about shapes? (IE: value tensor of shape [1280] cannot be broadcast to indexing result of shape [0, 768])

A: There are two known reasons for shape errors:

  • The sanity check is failing when starting Textual-Inversion training. Try leaving out the --actual-resume argument when launching main.py. Chances are, the next error you receive will be an Out of Memory Error. See earlier in the FAQ for that.
  • Stable Diffusion is erroring out when you try to use an embeddings file. This is likely because you ran the Textual-Inversion training with the wrong configuration. As of writing, TI and SD are not integrated. Make sure you have downloaded the config/v1-finetune.yaml file from this repo and that you use --base configs/stable-diffusion/v1-finetune.yaml when training embeddings. Retrain and try again.

You might also like...

A set of scripts I use with stable diffusion.

the Artificial Artist Set-up Dependencies Let's install the dependencies in a fresh conda environment: conda create --name artist python conda activat

Aug 26, 2022

a qqbot with stable-diffusion-docker-service

a qqbot with stable-diffusion-docker-service

stable-diffusion-qqbot 一个支持中文的stable-diffusion文字生成图像机器人,需要在部署docker版本的stable-diffusion时挂载模型进去 如何部署 下载stable-diffusion的模型文件,参见 https://github.com/CompV

Nov 21, 2022

Playing around with stable diffusion

Playing around with stable diffusion

Playing around with stable diffusion. Generated images are reproducible because I save the metadata and latent information. You can generate and then later interpolate between the images of your choice.

Jan 8, 2023

A bridge between Blender and Stable Diffusion for the purposes of creating AI generated textures.

Cozy-Auto-Texture A bridge between Blender and Stable Diffusion for the purposes of creating AI generated textures. The purpose of this repository is

Dec 22, 2022

Simple Discord bot for local installations of the public release of Stable Diffusion 1.4 on Windows

G-Diffuser-Bot G-Diffuser-Bot is a simple, compact Discord bot for local installations of the public release of Stable-Diffusion (https://github.com/C

Jan 5, 2023

GPU-ready Dockerfile to run the Stability.AI stable-diffusion model with a simple web interface

GPU-ready Dockerfile to run the Stability.AI stable-diffusion model with a simple web interface

A friend of mine working in art/design wanted to try out Stable Diffusion on his own GPU-equipped PC, but he doesn't know much about coding, so I thou

Jan 7, 2023

Dreambooth implementation based on Stable Diffusion with minimal code.

Dreambooth implementation based on Stable Diffusion with minimal code.

Stable DreamBooth This is an implementation of DreamBooth based on Stable Diffusion. Results Dreambooth results from original paper: The reproduced re

Jan 3, 2023

A daemon which watches a queue and runs stable diffusion.

A daemon which watches a queue and runs stable diffusion.

stablediffusiond A daemon which watches for messages on RabbitMQ and runs Stable Diffusion No hot loading - Model stored in RAM (10GB~) for faster pro

Nov 29, 2022

AI imagined images. Pythonic generation of stable diffusion images.

AI imagined images. Pythonic generation of stable diffusion images.

ImaginAIry 🤖 🧠 AI imagined images. Pythonic generation of stable diffusion images. "just works" on Linux and OSX(M1). Examples pip install imagin

Jan 9, 2023
Comments
  • Updated README.md

    Updated README.md

    Spent like 12 hours yesterday digging into code and troubleshooting. Noticed a lack of information for Textual Inversion (especially for use with SD), so thought I'd compile all of my findings to make things easier for others. Hope this helps. Thanks.

    opened by toledosq 1
Owner
ETH: 0xD3ad2bE8B9e770BA3befC3b34ada63B275BCE730 | BTC: bc1qnyvctcnnw2ksnhftun6zgk38m5lzchurpfcx7m | LTC: ltc1q8uss9a884uytv4e3zpzvp5rstcyy0fqc26n90z
null
Stable Diffusion Video to Video, Image to Image, Template Prompt Generation system and more, for use with any stable diffusion model

SDUtils: Stable Diffusion Utility Wrapper Stable Diffusion General utilities wrapper including: Video to Video, Image to Image, Template Prompt Genera

null 15 Dec 1, 2022
These files are for bulk renaming files, bulk render blender files, convert xml to csv (labelimg), export blender camera data, export additional test files from frcnn tests.

FRCNN_Related_Code The file you see here are all used for preparing frcnn training and testing files. Software Version Windows 11 Blender 3.1 Python 3

null 1 Sep 23, 2022
Diffusion attentive attribution maps for interpreting Stable Diffusion.

What the DAAM: Interpreting Stable Diffusion Using Cross Attention Caveat: the codebase is in a bit of a mess. I plan to continue refactoring and poli

Castorini 222 Dec 26, 2022
null 5 Oct 6, 2022
Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts

stable-diffusion-videos Try it yourself in Colab: Example - morphing between "blueberry spaghetti" and "strawberry spaghetti" berry_good_spaghetti.2.m

Nathan Raw 1.8k Jan 2, 2023
Long-form text-to-images generation, using a pipeline of deep generative models (GPT-3 and Stable Diffusion)

Long Stable Diffusion: Long-form text to images e.g. story -> Stable Diffusion -> illustrations Right now, Stable Diffusion can only take in a short p

Sharon Zhou 571 Jan 7, 2023
A pytorch implementation of text-to-3D dreamfusion, powered by stable diffusion.

Stable-Dreamfusion A pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model. The original paper'

kiui 3.5k Jan 1, 2023
Script for AUTOMATIC1111/stable-diffusion-webui to provide a way to quickly add tags from a list to your prompt

tagger Script for AUTOMATIC1111/stable-diffusion-webui to provide a way to quickly add tags from a list to your prompt What it does The script will ad

null 19 Dec 19, 2022
Simple prompt generator for Midjourney, DALLe, Stable and Disco Diffusion, and etc.

Simple Prompt Generator Simple prompt generation script for Midjourney, DALLe, Stable and Disco diffusion and etc neural networks. Quick start Downloa

Артём Голуб 51 Dec 30, 2022
Runs the official Stable Diffusion release in a Docker container.

Stable Diffusion in Docker Runs the official Stable Diffusion v1.4 release on Huggingface in a GPU accelerated Docker container. ./build.sh run 'An im

null 281 Jan 4, 2023