An implementation of Elucidating the Design Space of Diffusion-Based Generative Models (Karras et al., 2022) for PyTorch.

Overview

k-diffusion

An implementation of Elucidating the Design Space of Diffusion-Based Generative Models (Karras et al., 2022) for PyTorch.

This repo is a work in progress (models may break on later versions, script options may change). Also Config F is not currently implemented, this repo currently implements Config E.

Multi-GPU and multi-node training is supported with Hugging Face Accelerate. You can configure Accelerate by running:

$ accelerate config

on all nodes, then running:

$ accelerate launch train.py --train-set LOCATION_OF_TRAINING_SET --size IMAGE_SIZE

on all nodes.

Comments
  • Use more standard fid/kid calculation

    Use more standard fid/kid calculation

    #7

    • Using your KID calculation as it seems to be relatively close to cleanfid's but is not stochastic as theirs
    • Had to use cleanfid's numpy calculation for FID since the results were quite different
    • Resize is using cleanfid's PIL resizing. resize_right was relatively close (a_tol=1e-3) but this still changes FID too much

    Thanks for providing this very well written code, it has been very nice to read and compare :D

    opened by samedii 18
  • [Feature request] Let user provide his own randn data for samplers in sampling.py

    [Feature request] Let user provide his own randn data for samplers in sampling.py

    Please add an option for samplers to accept an argument with random data and use that if it is provided.

    The reason for this is as follows.

    We use samplers in stable diffusion to generate pictures, and we use seeds to make it possible for other users to reproduce results.

    In a batch of one image, everything works perfectly: set seed beforehand, generate noise, run sampler, and get the image everyone else will be able to get.

    If the user produces a batch of multiple images (which is desirable because it works faster than multiple independent batches), the expectation is that each image will have its own seed and will be reproducible individually outside of the batch. I achieve that for DDIM and PLMS samplers from stable diffusion by preparing the correct random noise according to seeds beforehand, and since those samplers do not have randomness in them, it works well.

    Samplers here use torch.randn in a loop, so samples in a batch will get different random data than samples produced individually, which results in different output.

    An example of what I want to have:

    from

    def sample_euler_ancestral(model, x, sigmas, extra_args=None, callback=None, disable=None):
        """Ancestral sampling with Euler method steps."""
        extra_args = {} if extra_args is None else extra_args
        s_in = x.new_ones([x.shape[0]])
        for i in trange(len(sigmas) - 1, disable=disable):
            denoised = model(x, sigmas[i] * s_in, **extra_args)
            sigma_down, sigma_up = get_ancestral_step(sigmas[i], sigmas[i + 1])
            if callback is not None:
                callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
            d = to_d(x, sigmas[i], denoised)
            # Euler method
            dt = sigma_down - sigmas[i]
            x = x + d * dt
            x = x + torch.randn_like(x) * sigma_up
        return x
    

    to

    def sample_euler_ancestral(model, x, sigmas, extra_args=None, callback=None, disable=None, user_random_data=None):
        """Ancestral sampling with Euler method steps."""
        extra_args = {} if extra_args is None else extra_args
        s_in = x.new_ones([x.shape[0]])
        for i in trange(len(sigmas) - 1, disable=disable):
            denoised = model(x, sigmas[i] * s_in, **extra_args)
            sigma_down, sigma_up = get_ancestral_step(sigmas[i], sigmas[i + 1])
            if callback is not None:
                callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
            d = to_d(x, sigmas[i], denoised)
            # Euler method
            dt = sigma_down - sigmas[i]
            x = x + d * dt
            x = x + (torch.randn_like(x) if user_random_data is None else user_random_data[i]) * sigma_up
        return x
    

    (difference only in next-to-last line)

    opened by AUTOMATIC1111 16
  • Standard FID

    Standard FID

    I created a nicer wrapper around cleanfid that works like your implementation with the intention of creating a PR for this repo but it's not working with your accelerator/multiprocessing.

    I'm considering if I should try to use your code and switch the model to what they use in cleanfid to try and reproduce the results but before I spend more time on this I thought I should ask if you think it's possible to reproduce or if I will run into issues?

    Your implemention looks a lot nicer than cleanfid so I expect it will be easy to work with at least

    opened by samedii 9
  • Add the newest DPM-Solver

    Add the newest DPM-Solver

    Thanks for your amazing work and interest in DPM-Solver! We have updated DPM-Solver v2.0, which supports four types of diffusion models: noise prediction model, data prediction model, v-prediction model, and score function.

    Moreover, we supported both single-step and multi-step versions and the corresponding algorithms for the exponential integrators for both the noise prediction model and the data prediction model.

    I'm glad to help if you want to further support our DPM-Solver in this repo :)

    opened by LuChengTHU 5
  • Question of  sampler

    Question of sampler

    Hi, @crowsonkb

    I'd like to ask you the roles of your implemented samples. sample_heun represents a stochastic sampling and sample_lms represents a deterministic sampler, right?

    I’m sorry if I misunderstand your implementation and Karras' theory.

    opened by UdonDa 5
  • Config for training other resolutions

    Config for training other resolutions

    Hello and thanks for the implementation of the paper! I ran the code with the current config and it seems to do very good, how would one go about training a model with images of size 64x64 or 128x128?

    Thanks, Eliahu

    opened by eliahuhorwitz 4
  • HuggingFace Datasets support

    HuggingFace Datasets support

    Adding support for HuggingFace Datasets.

    I add the --hf-datasets flag which indicates that the string passed to --train-set is a HuggingFace Dataset. I assume that there is a train split (as is most often the case), and the key for the images in the dataset is provided by --hf-datasets-key.

    Hopefully this is fine, but let me know if you want me change this interface in any way...

    opened by tmabraham 3
  • How to perform a forward process from x_0 to x_t

    How to perform a forward process from x_0 to x_t

    Hi, @crowsonkb !

    I confuse a forward process from x_0 to x_t. Would you teach me? I'd like to implement conditional augmentation in Imagen paper for a super-resolution. It perturbs x_0 and obtains x_t and t (in your implementation, I guess t means sigma).

    I know that your implementation uses randomly sigma here and create a noisy x_t samples here. noised_input = c_in * (input + noise * utils.append_dims(sigma, input.ndim)) as shown in Eq. 7 Then, does noised_input means x_t images, which are created by a Karras' forward diffusion process, right?

    opened by UdonDa 3
  • White line augmentation artifact

    White line augmentation artifact

    Thank you for open sourcing this! I tried out your implementation of the non-leaky augmentations. In case it's helpful to you I noticed that there seem to be some artifacts created in the augmentation pipeline that will probably not help in training

    Left is normal. Right has an added white line image

    (I have a different implementation that I built a couple of weeks ago but I didn't do the non-leaky augmentations then. For what it's worth I can say that I've also gotten better results with these techniques than for example v-diff on small real world datasets)

    opened by samedii 3
  • Reproducibility work on the sampling code

    Reproducibility work on the sampling code

    The sampling code includes a bit of randomness in noise generation. To make the noise generation process more deterministic, we propose generating the random noise for each item in the batch according to a predetermined seed. This approach is used in the production code of NovelAI to ensure reproducibility.

    opened by ardacihaner 2
  • remove `clean-fid`

    remove `clean-fid`

    The clean-fid library has an old version of requests pinned in its dependencies that's currently causing me some headaches.

    I see a couple discussions related to replacing/reimplementing clean-fid (#7 and #8), and I don't see any imports related to it in the current code, so I was wondering if you'd be open to removing it from the requirements file for the next release. Feel free to close this if not; just thought I'd check.

    Thanks!

    opened by space-pope 2
  • Stabilize the sampling of DPM-Solver++2M by a stabilizing trick

    Stabilize the sampling of DPM-Solver++2M by a stabilizing trick

    Hi Katherine,

    Thank you for your great work on supporting DPM-Solver++, and I've found that it has been used in stable-diffusion-webui and has a great performance: https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/4304. Thank you for your contribution again!

    However, the sampling by DPM-Solver++2M with steps <= 10 often suffers from instability issues (the image quality is much worse than DDIM). In my recent experience, I found that it is due to the non-Lipschitzness near t=0. (In fact, the score function has numerical issues for t near 0, and it has been revealed in many previous papers, such as CLD and SoftTruncation. )

    Therefore, in my recent PR to diffusers, I further added a new "stabilizing" trick to reduce such instability by using lower-order solvers at the final steps (e.g., for 2nd-order DPM-Solver++, I used DPM-Solver++2M at the first N-1 steps and DDIM at the final step.) I find it can greatly stabilize the sampling by DPM-Solver++2M. Please check this PR for details: https://github.com/huggingface/diffusers/pull/1132

    Excuse me for my frequent issues, but could you please further support this "stabilizing" trick in k-diffusion, so that other projects, such as stable-diffusion-webui can further support it? Thank you very much!

    opened by LuChengTHU 19
  • Codestyle improvements: add typehints (PEP 484) and format code to meet PEP 8 recommendation

    Codestyle improvements: add typehints (PEP 484) and format code to meet PEP 8 recommendation

    There is no changes for logic, just codestyle improvements to make further work more comfortable.

    Improvements:

    • Added typehints (PEP 484) for automatically bugs search with typecheckers and more comfortable development and usage by end users.
    • Internals code formatted with black auto formatter for meeting PEP 8 recommendations.
    opened by jorektheglitch 0
  • Simple evaluation script

    Simple evaluation script

    We wrote this for fastai but might be useful here too :)

    Note: it requires fastcore.

    We could extend it, if desired, to download models from Hugging Face Hub or W&B artifacts.

    opened by pcuenca 0
  • Do not merge! Credits for DEIS

    Do not merge! Credits for DEIS

    Hi Katherine,

    Thanks for your awesome work. I open the issue to ask for credits of DEIS sampler.

    1. We are actually the first public work that proposes the usage of the exponential integrator method for solving diffusion model ODE, several months easier than concurrent works DPM-solver and Tero's Elucidate work.

    Concurrent DPM-Solver~(second order and third order) and Heun method in Elucidate work is in fact exponential RK method with specific underlying RK solvers,

    • DPM-Solver 2 -> second order mid point
    • DPM-Solver 3 -> third order kutta method
    • Heun -> Heun.

    Despite different motivations, equation 18 is the key for DPM-Solver and Heun in Elucidate work. And two changes of variables $t <-> \rho$ and $x <->y$ are equivalent to "rescaling" in Elucidate work. And we can use more RK solvers for eq-18. Empirical comparisons among several RK solvers are also included in DEIS.

    image

    1. Besides, applying linear multistep with non-uniform step-size and lower order warming started in the transformed ODE is first proposed in DEIS. see $\rho AB$-DEIS.

    I know it is a bot area, but it is frustrating to be "scooped" and lose credits for a young researcher. Can you consider some modifications and have some credits for DEIS work?

    Thanks, qsh

    opened by qsh-zh 0
  • sample sigma scheduler bug

    sample sigma scheduler bug

    I found that k_diffusion.sampling.get_sigmas_karras returns a list endswith 0, it leads to last dt (sigma[i+1]-sigma[i]) maybe bigger than others, for example:

    dt: [-1.3240442276000977, 
    -1.1708478927612305, 
    -1.0327138900756836, 
    -0.9084315299987793, 
    -0.7968769073486328, 
    -0.6969814300537109, 
    -0.6077454090118408, 
    -0.5282387733459473, 
    -0.4575979709625244, 
    -0.39501309394836426, 
    -0.3397289514541626, 
    -0.2910478115081787, 
    -0.24832558631896973, 
    -0.21096330881118774, 
    -0.17840701341629028, 
    -0.1501503586769104, 
    -0.12572622299194336, 
    -0.10470682382583618, 
    -0.08670148253440857, 
    -0.07135394215583801, 
    -0.05834062397480011, 
    -0.04736843705177307, 
    -0.03817273676395416, 
    -0.030515573918819427,
    -0.10000000149011612 **********
    ]
    

    I think it may be a bug.

    This issues is found at https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2794

    opened by lockmatrix 2
  • added support for non square Images

    added support for non square Images

    Non-Square images work basically out of the box. I trained a model with the smithsonian_butterflies_subset with 32x64 pixels and compared it to 48x48 pixels to roughly have the same number of total pixels.

    FID 48x48 was 50.2 after 10_000 steps with batchsize 16 model_demo_00010000

    FID 64x32 was 59.9 after 10_000 steps with batchsize 16 model_demo_00010000

    opened by p-sodmann 0
Releases(v0.0.11)
Owner
Katherine Crowson
AI/generative artist.
Katherine Crowson
[ECCV 2022] Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling

Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling Paper Accelerating Score-based Generative Models with Preconditioned

Fudan Zhang Vision Group 26 Nov 6, 2022
Official PyTorch implementation of "Denoising MCMC for Accelerating Diffusion-Based Generative Models"

DMCMC Official PyTorch implementation of Denoising MCMC for Accelerating Diffusion-Based Generative Models. We propose a general sampling framework, D

Beomsu Kim 13 Nov 16, 2022
Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation

Speech Enhancement and Dereverberation with Diffusion-based Generative Models This repository contains the official PyTorch implementations for the 20

Signal Processing (SP), Universität Hamburg 128 Nov 19, 2022
Repository for the paper: 'Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models'

Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models This is the official repository for the paper Diffusion-base

null 77 Nov 22, 2022
Code for the NeurIPS 2022 paper "Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models"

Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models Official PyTorch implementation of our NeurIPS 2022 paper G

Chen Wu (吴尘) 83 Nov 20, 2022
Tensorflow implementation of "Tackling the Generative Learning Trilemma with Denoising Diffusion GANs" (ICLR 2022 Spotlight)

DDGAN — TensorFlow Implementation [Project page] : Tackling the Generative Learning Trilemma with Denoising Diffusion GANs (ICLR 2022 Spotlight) Abstr

Junho Kim 13 Oct 26, 2022
Official PyTorch implementation for paper: Diffusion-GAN: Training GANs with Diffusion

Diffusion-GAN — Official PyTorch implementation Diffusion-GAN: Training GANs with Diffusion Zhendong Wang, Huangjie Zheng, Pengcheng He, Weizhu Chen a

Daniel 174 Nov 21, 2022
Implementation of Bit Diffusion, Hinton's group's attempt at discrete denoising diffusion, in Pytorch

Bit Diffusion - Pytorch Implementation of Bit Diffusion, Hinton's group's attempt at discrete denoising diffusion, in Pytorch It seems like they misse

Phil Wang 151 Nov 16, 2022
2022 NASA SPACE APPS CHALLENGE Challenge: LEARNING THROUGH THE LOOKING GLASS Team: The Space Runner

Mind_Mirror-Bricks_Breaker Table of contents General info Technologies Setup General info 2022 NASA SPACE APPS CHALLENGE Challenge: LEARNING THROUGH T

null 1 Oct 2, 2022
Code for "Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance"

A Unified Interface for Guiding Generative Models (2D/3D GANs, Diffusion Models, and Their Variants) Official PyTorch implementation of (Section 4.3 o

Chen Wu (吴尘) 81 Nov 24, 2022
Code for "Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance"

CycleDiffusion: Text-to-Image Diffusion Models Are Image-to-Image Editors via Inferring "Random Seed" Official PyTorch implementation of (Sections 4.1

Chen Wu (吴尘) 142 Oct 24, 2022
Sound design with generative neural network models for percussion sounds 🥁

PercVAE - Design Percussive Sounds with Deep Learning ?? PercVAE is a deep learning-based tool that allows for different types of sample generation of

Fennek AI 2 May 5, 2022
Stable Diffusion web UI - A browser interface based on Gradio library for Stable Diffusion

Stable Diffusion web UI A browser interface based on Gradio library for Stable Diffusion. Features Detailed feature showcase with images: Original txt

null 23k Nov 29, 2022
Long-form text-to-images generation, using a pipeline of deep generative models (GPT-3 and Stable Diffusion)

Long Stable Diffusion: Long-form text to images e.g. story -> Stable Diffusion -> illustrations Right now, Stable Diffusion can only take in a short p

Sharon Zhou 550 Nov 27, 2022
Transfer Learning based Search Space Design for Hyperparameter Tuning [SIGKDD'22]

Transfer Learning based Search Space Design for Hyperparameter Tuning Transfer Learning based Search Space Design for Hyperparameter Tuning (SIGKDD'22

DAIR Lab 3 Oct 24, 2022
This is the official implementation for Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models (Accepted in ICML 2022)

Official implementation for Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models (ICML 2022), and a reimplementation of Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models (ICLR 2022)

Fan Bao 63 Nov 20, 2022
Implementations of robust Dual Curriculum Design (DCD) algorithms for unsupervised environment design.

Dual Curriculum Design This codebase contains an extensible framework for implementing various Unsupervised Environment Design (UED) algorithms, inclu

Meta Research 72 Nov 10, 2022
Bias-controlled 3D generative framework for structure-based ligand design

Overview Installation Getting started Citation This package implements the PQR framework -- a generative approach to structure-based ligand elaboratio

null 8 Jun 28, 2022
An Pytorch implementation of Deep Convolutional Generative Adversarial Networks which is a stabilize Generative Adversarial Networks.

DCGAN in Pytorch An Pytorch implementation of Deep Convolutional Generative Adversarial Networks which is a stabilize Generative Adversarial Networks.

NirvanaFY 1 Oct 13, 2022