Implementation of Transframer, Deepmind's U-net + Transformer architecture for up to 30 seconds video generation, in Pytorch

Overview

Transframer - Pytorch (wip)

Implementation of Transframer, Deepmind's U-net + Transformer architecture for up to 30 seconds video generation, in Pytorch

The gist of the paper is the usage of a Unet as a multi-frame encoder, along with a regular transformer decoder cross attending and predicting the rest of the frames. The author builds upon his prior work where images are encoded as sparse discrete cosine transform (DCT) sequences.

I will deviate from the implementation in this paper, using a hierarchical autoregressive transformer, and just a regular resnet block in place of the NF-net block (this design choice is just Deepmind reusing their own code, as NF-net was developed at Deepmind by Brock et al).

Update: On further meditation, there is nothing new in this paper except for generative modeling on DCT representations

Appreciation

  • This work would not be possible without the generous sponsorship from Stability AI, as well as my other sponsors

Todo

  • figure out if dct can be directly extracted from images in jpeg format

Citations

@article{Nash2022TransframerAF,
    title   = {Transframer: Arbitrary Frame Prediction with Generative Models},
    author  = {Charlie Nash and Jo{\~a}o Carreira and Jacob Walker and Iain Barr and Andrew Jaegle and Mateusz Malinowski and Peter W. Battaglia},
    journal = {ArXiv},
    year    = {2022},
    volume  = {abs/2203.09494}
}
You might also like...

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways

PaLM - Pytorch Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways, in less than 200 lines of

Nov 23, 2022

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)

PaLM - Jax Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax using Equinox May as w

Nov 14, 2022

A TensorFlow 2.0 implementation of the Swin Transformer architecture.

A TensorFlow 2.0 implementation of the Swin Transformer architecture.

Swin Transformer Description This is a TensorFlow 2.0 implementation of the Swin Transformer architecture. It is built using the Keras API following b

Jul 5, 2022

TATS: A Long Video Generation Framework with Time-Agnostic VQGAN and Time-Sensitive Transformer

TATS: A Long Video Generation Framework with Time-Agnostic VQGAN and Time-Sensitive Transformer

Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer Project Website | Video | Paper tl;dr We propose TATS, a long video gene

Nov 21, 2022

[IJCV - Accepted] AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation

AOE-Net Source code of paper: "AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation", wh

Nov 23, 2022

Pytorch implementation of paper "DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation", ECCV 2022.

Pytorch implementation of paper

DynaST This is the pytorch implementation of the following ECCV 2022 paper: DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation So

Nov 24, 2022

Stable Diffusion Video to Video, Image to Image, Template Prompt Generation system and more, for use with any stable diffusion model

SDUtils: Stable Diffusion Utility Wrapper Stable Diffusion General utilities wrapper including: Video to Video, Image to Image, Template Prompt Genera

Oct 17, 2022
Comments
  • You are right!

    You are right!

    The only semi new thing is the DCT but even that I think is not completely new either. I thought there was already a generative model that did that not sure if it was autoregressive though.

    But I guess the rational is that DCT is better than any learned quantized fixed length encoder and it saves you training an encoder.

    opened by Mut1nyJD 1
Owner
Phil Wang
Working with Attention. It's all we need
Phil Wang
A handy Python library to shift your subtitles +/- seconds so they align with your video

Subshift Version 0.1.6 A handy Python library to shift your subtitles +/- seconds so they align with your video Usage: Short Flag Long Flag Descriptio

Audel Rouhi 2 Nov 22, 2022
"100 Seconds of Code" video generator

Hundred Do you like Fireship's "100 Seconds of Code" videos? This project is an attempt at creating those videos automatically, so you can watch as ma

youngermax 2 Jun 24, 2022
Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch

these fireworks do not exist Video Diffusion - Pytorch (wip) Text to video, it is happening! Official Project Page Implementation of Video Diffusion M

Phil Wang 494 Nov 21, 2022
URL-Short-BOT can short your long URL in seconds, It also lets you know how many times your link has been visited !

URL-Short-BOT URL-Short-BOT can short your long URL in seconds, It also lets you know how many times your link has been visited ! Developer me szsupun

szsupunma 4 May 18, 2022
Write a function, which takes a non-negative integer (seconds) as input and returns the time in a human-readable format (HH:MM:SS)

Codewars-5kyu-HumanReadableTime_python Write a function, which takes a non-negative integer (seconds) as input and returns the time in a human-readabl

Akaki 1 Apr 5, 2022
Txtwriter helps you make a .txt file in matter of seconds, being incredibly easy to use while being ran in Python

txtwriter Why txtwriter? Txtwriter helps you make a .txt file in matter of seconds, being incredibly easy to use while being ran in Python, and is sup

Simply 1 May 13, 2022
Hash master is an easy-to use tool to crack hashes in a few seconds.

Hash Master Hash master is an easy-to use Tool to crack hashes in a few seconds. How to Install # Git git clone https://github.com/ZSendokame/hashMast

ZSendokame 1 Jun 4, 2022
Discord token creator, creates tokens in seconds, uses requests.

Discord-Token-Generator How to use WILL NOT WORK WITHOUT API KEY WILL NOT WORK WITHOUT BALANCE ON CAPMONSTER put your Proxy(s) in data\proxies.txt . p

Petztra 80 Nov 11, 2022
Integrate mypy in seconds with existing codebase. A friendly CLI tool to make mypy report only new type violations and ignore existing ones.

mypy-baseline A CLI tool for painless integration of mypy with an existing Python project. When you run it for the first time, it will remember all ty

Orsinium Labs 14 Nov 21, 2022
Discord token creator, creates tokens in seconds, uses requests.

Discord-Token-Generator How to use WILL NOT WORK WITHOUT API KEY WILL NOT WORK WITHOUT BALANCE ON CAPMONSTER put your Proxy(s) in data\proxies.txt . p

null 2 Oct 23, 2022