Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Overview

Bailando

Code for CVPR 2022 (oral) paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

[Paper] | [Project Page] | [Video Demo]

Driving 3D characters to dance following a piece of music is highly challenging due to the spatial constraints applied to poses by choreography norms. In addition, the generated dance sequence also needs to maintain temporal coherency with different music genres. To tackle these challenges, we propose a novel music-to-dance framework, Bailando, with two powerful components: 1) a choreographic memory that learns to summarize meaningful dancing units from 3D pose sequence to a quantized codebook, 2) an actor-critic Generative Pre-trained Transformer (GPT) that composes these units to a fluent dance coherent to the music. With the learned choreographic memory, dance generation is realized on the quantized units that meet high choreography standards, such that the generated dancing sequences are confined within the spatial constraints. To achieve synchronized alignment between diverse motion tempos and music beats, we introduce an actor-critic-based reinforcement learning scheme to the GPT with a newly-designed beat-align reward function. Extensive experiments on the standard benchmark demonstrate that our proposed framework achieves state-of-the-art performance both qualitatively and quantitatively. Notably, the learned choreographic memory is shown to discover human-interpretable dancing-style poses in an unsupervised manner.

Code

Environment

PyTorch == 1.6.0

Data preparation

In our experiments, we use AIST++ for both training and evaluation. Please visit here to download the AIST++ annotations and unzip them as './aist_plusplus_final/' folder, visit here to download the original all music pieces (wav) into './aist_plusplus_final/all_musics'. And please set up the AIST++ API from here and download the required SMPL models from here. Please make a folder 'smpl' and rename the downloaded 'male' SMPL model (with '_m' in name) to 'smpl/SMPL_MALE' and finally run

./prepare_aistpp_data.sh

to produce the features for training and test. Otherwise, directly download our preprocessed feature from here as ./data folder if you don't wish to process the data.

Training

The training of Bailando comprises of 4 steps in the following sequence. If you are using the slurm workload manager, you can directly run the corresponding shell. Otherwise, please remove the 'srun' parts. Our models are all trained with single NVIDIA V100 GPU. * A kind reminder: the quantization code does not fit multi-gpu training

Step 1: Train pose VQ-VAE (without global velocity)

sh srun.sh configs/sep_vqvae.yaml train [your node name] 1

Step 2: Train glabal velocity branch of pose VQ-VAE

sh srun.sh configs/sep_vavqe_root.yaml train [your node name] 1

Step 3: Train motion GPT

sh srun_gpt_all.sh configs/cc_motion_gpt.yaml train [your node name] 1

Step 4: Actor-Critic finetuning on target music

sh srun_actor_critic.sh configs/actor_critic.yaml train [your node name] 1

Evaluation

To test with our pretrained models, please download the weights from here (Google Drive) or here (坚果云) as ./experiments folder.

1. Generate dancing results

sh srun_xxx.sh configs/xxx.yaml eval [your node name] 1

2. Dance quality evaluations

TODO

Choreographic for music in the wild

TODO

Citation

@inproceedings{siyao2022bailando,
    title={Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory,
    author={Siyao, Li and Yu, Weijiang and Gu, Tianpei and Lin, Chunze and Wang, Quan and Qian, Chen and Loy, Chen Change and Liu, Ziwei },
    booktitle={CVPR},
    year={2022}
}

License

Our code is released under MIT License.

Comments
  • Some file missing in data.zip

    Some file missing in data.zip

    Hi, thanks for sharing your code! I download the data.zip that you released. However, I got an error here. I check the data and I found that aistpp_music_feat_7.5fps/mJB4.json is empty. Is it a mistake?

    opened by xljh0520 3
  • The generated person in video disappear in the video sometimes?

    The generated person in video disappear in the video sometimes?

    Thank you for your work! I follow the steps of Choreographic for music in the wild. I get the output videos of the model. Problem: But I find the person in the videos would disappear sometimes.And there is no sound in the video. Q1 : Is there a way to solve the problem? Q2: Will this kind of disappearance affect the location of 3d keypoints?

    opened by aleeyang 2
  • ValueError: only one element tensors can be converted to Python scalars

    ValueError: only one element tensors can be converted to Python scalars

    hey, I face this problem using the first step command python -u main.py --config configs/sep_vqvae.yaml --train THE OUTPUT IS using SepVQVAE We use bottleneck! No motion regularization! We use bottleneck! No motion regularization! train with AIST++ dataset! test with AIST++ dataset! {'structure': {'name': 'SepVQVAE', 'up_half': {'levels': 1, 'downs_t': [3], 'strides_t': [2], 'emb_width': 512, 'l_bins': 512, 'l_mu': 0.99, 'commit': 0.02, 'hvqvae_multipliers ': [1], 'width': 512, 'depth': 3, 'm_conv': 1.0, 'dilation_growth_rate': 3, 'sample_length': 240, 'use_bottleneck': True, 'joint_channel': 3, 'vel': 1, 'acc': 1, 'vqvae_reverse _decoder_dilation': True, 'dilation_cycle': None}, 'down_half': {'levels': 1, 'downs_t': [3], 'strides_t': [2], 'emb_width': 512, 'l_bins': 512, 'l_mu': 0.99, 'commit': 0.02, ' hvqvae_multipliers': [1], 'width': 512, 'depth': 3, 'm_conv': 1.0, 'dilation_growth_rate': 3, 'sample_length': 240, 'use_bottleneck': True, 'joint_channel': 3, 'vel': 1, 'acc': 1, 'vqvae_reverse_decoder_dilation': True, 'dilation_cycle': None}, 'use_bottleneck': True, 'joint_channel': 3, 'l_bins': 512}, 'loss_weight': {'mse_weight': 1}, 'optimizer': {'type': 'Adam', 'kwargs': {'lr': 3e-05, 'betas': [0.5, 0.999], 'weight_decay': 0}, 'schedular_kwargs': {'milestones': [100, 200], 'gamma': 0.1}}, 'data': {'name': 'aist', 'tra in_dir': 'data/aistpp_train_wav', 'test_dir': 'data/aistpp_test_full_wav', 'seq_len': 240, 'data_type': 'None'}, 'testing': {'height': 540, 'width': 960, 'ckpt_epoch': 500}, 'e xpname': 'sep_vqvae', 'epoch': 500, 'batch_size': 32, 'save_per_epochs': 20, 'test_freq': 20, 'log_per_updates': 1, 'seed': 42, 'rotmat': False, 'cuda': True, 'global_vel': Fal se, 'ds_rate': 8, 'move_train': 40, 'sample_code_length': 150, 'sample_code_rate': 16, 'analysis_sequence': [[126, 81]], 'config': 'configs/sep_vqvae.yaml', 'train': True, 'eva l': False, 'visgt': False, 'anl': False, 'sample': False} 07/03/2022 03:06:44 Epoch: 1 Traceback (most recent call last): File "main.py", line 56, in <module> main() File "main.py", line 40, in main agent.train() File "/home/fuyang/project/Bailando/motion_vqvae.py", line 107, in train 'loss': loss.item(), **ValueError: only one element tensors can be converted to Python scalars** My environment is pytorch 1.11.0+cu102 8xGPU NVIDIA TITAN Xp(12196MiB)

    opened by aleeyang 2
  • training problem

    training problem

    I meet error at Step 1 by running python -u main.py --config configs/sep_vqvae.yaml --train

    Traceback (most recent call last):
      File "main.py", line 56, in <module>
        main()
      File "main.py", line 40, in main
        agent.train()
      File "/share/yanzhen/Bailando/motion_vqvae.py", line 94, in train
        loss.backward()
      File "/root/anaconda3/envs/workspace/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
      File "/root/anaconda3/envs/workspace/lib/python3.8/site-packages/torch/autograd/__init__.py", line 166, in backward
        grad_tensors_ = _make_grads(tensors, grad_tensors_, is_grads_batched=False)
      File "/root/anaconda3/envs/workspace/lib/python3.8/site-packages/torch/autograd/__init__.py", line 67, in _make_grads
        raise RuntimeError("grad can be implicitly created only for scalar outputs")
    RuntimeError: grad can be implicitly created only for scalar outputs
    

    After print the loss, it looks like tensor([0.2667, 0.2735, 0.2687, 0.2584, 0.2701, 0.2697, 0.2571, 0.2658], device='cuda:0', grad_fn=<GatherBackward>), so do I need to take a mean or sum operation?

    However, even if I take a mean operation, the training still seems problematic. The loss decreases normally, while in eval stage, the output quants are all zero. Any suggestion?

    The training log is attached for reference.

    log.txt

    @lisiyao21

    opened by haofanwang 2
  • Why extracting audio features twice with different sampling rate?

    Why extracting audio features twice with different sampling rate?

    Hi, Siyao~ Thanks for releasing and cleaning the code!!

    May I ask why in the pre-processing part, the audio (music) features are extracted twice and with different sampling rates?

    Precisely, in _prepro_aistpp.py, the audio features are extracted with the sampling rate 15360*2

    While in _prepro_aistpp_music.py, the audio features are extracted with the sampling rate 15360*2/8

    opened by XiSHEN0220 2
  • Processed data

    Processed data

    Hi, There is no link to the processed data, since you said that "Otherwise, directly download our preprocessed feature from here as ./data folder if you don't wish to process the data."

    Can you add the data link? Thanks!

    opened by LaLaLailalai 2
  • where is  the definition of

    where is the definition of "from .utils.torch_utils import assert_shape"

    Hi @lisiyao21 Thank you for release code! where is the definition of "from .utils.torch_utils import assert_shape" https://github.com/lisiyao21/Bailando/blob/27fe2b63896a2e31928b22944bac10455413263e/models/encdec.py#L4

    opened by zhangsanfeng86 2
  • Visualize gt error.

    Visualize gt error.

    I want to compare the generated results with the ground truth. It seems that the code also support visualize ground truth by passing the visgt parameter. However when I call the script using the visgt parameter, it seems like that the code is not fully implemented: the last two parameters of visualizeAndWrite function is not set correctly. How should I set these two parameters (especially the last quants para) to make the function execute correctly?

    opened by miibotree 1
  • No such file or directory: '/mnt/lustre/share/lisiyao1/original_videos/aistpp-audio/'

    No such file or directory: '/mnt/lustre/share/lisiyao1/original_videos/aistpp-audio/'

    Bailando/utils/functional.py", line 130, in img2video music_names = sorted(os.listdir(audio_dir)) FileNotFoundError: [Errno 2] No such file or directory: '/mnt/lustre/share/lisiyao1/original_videos/aistpp-audio/'

    opened by donghaoye 1
  • about the data process

    about the data process

    Hello siyao! I'm reading your code and i'm confused about the 'align' function in '_prepro_aistpp.py'. To make the length(time) of music equal to that of dance, you throw the extra feature away. Is that reasonable? Why not do a uniform sampling? Sorry for bothering you. image

    opened by pengc02 0
  • about the run command

    about the run command

    For me, a beginner of DL, sh srun.sh configs/sep_vqvae.yaml train [your node name] 1 what dose the '[your node name]' mean? can you give me a more specific command? Thank you a lot!

    opened by aleeyang 0
  • Warning when evaluating

    Warning when evaluating

    Hi, siyao! I run the command python extract_aist_features.py to extract the (kinetic & manual) features of all AIST++ motions. However, I met with a warning:

    WARNING: You are using a SMPL model, with only 10 shape coefficients.
    

    Do you know the reason?

    opened by LinghaoChan 0
  • About

    About "cc_motion_gpt.yaml"

    Hi siyao, Thanks for your great work. I have a question. When I train in step 3 (Train motion GPT),an error occurs---"AttributeError: 'MCTall' object has no attribute 'training_data' ". And I check the "cc_motion_gpt.yaml", found the "need_not_train_data: True", which causes the "def _build_train_loader(self):" not work. Is that correct?Or should I change "need_not_train_data" to "false"?

    opened by im1eon 1
  • Is there a way to change the ‘Starting pose codes’?

    Is there a way to change the ‘Starting pose codes’?

    Hi,thank you for your work again! It really inspires me and bring me interest in deep learning! amazing job! Problem: I found the generated music dance videos in the same style which may not coordinated with my music(‘青花瓷-jay_chou’). I supposed it may be caused by the starting pose code.But I can not find how to choose and where to set it.

    Q1: Is there a way to change the ‘Starting pose codes’ which mentioned in your paper? Q2: How to choose the starting pose codes? Is there a table I can find explicitly mapping the starting pose codes to dance style

    Thank you again! Aleeyanger

    opened by aleeyang 0
  • The meanings of FID_k and FID_g of groudtruth?

    The meanings of FID_k and FID_g of groudtruth?

    Hi authors,

    Thank you for your fantastic work! I have a small question: In Table 1, FID_k, FID_g of groundtruth are reported. I am a little bit confused with this. Do they mean to compute FID_k and FID_g between the two same sets of groundtruth data? In other words, why FID_k and FID_g of the groundtruth are not 0?

    Thank you, Best

    opened by by2101 1
  • Out of Memory

    Out of Memory

    Hi there, When I run the second step as your instructions, I met the "out of memory" problem. I tried to debug the problem and found it is because the music_data is float64 and the memory is consumed rapidly when converting the list music_data to music_np. (in File "utils/functional.py"). Have you ever met the same problem like me? Is it possible to use float32 for training data(music_np) without decreasing the performance of the final model?

    BTW: There are 120G of memory in my computer.

    opened by lucaskingjade 1
Owner
Li Siyao
an interesting PhD student
Li Siyao
Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

PPE ✨ Repository for our CVPR'2022 paper: Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-

Zipeng Xu 32 Sep 3, 2022
[CVPR22] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

Qin Wang 71 Sep 19, 2022
Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

OpenDet Expanding Low-Density Latent Regions for Open-Set Object Detection (CVPR2022) Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-So

csuhan 54 Sep 14, 2022
[CVPR2022] This repository contains code for the paper "Nested Collaborative Learning for Long-Tailed Visual Recognition", published at CVPR 2022

Nested Collaborative Learning for Long-Tailed Visual Recognition This repository is the official PyTorch implementation of the paper in CVPR 2022: Nes

Jun Li 59 Sep 21, 2022
This is the code for the paper "Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Chenggang Yan, Tao Mei: Gait Recognition in the Wild with Dense 3D Representations and A Benchmark. (CVPR 2022)"

Gait3D-Benchmark This is the code for the paper "Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Chenggang Yan, Tao Mei: Gait Recognition in the Wild

null 71 Sep 7, 2022
code for our paper "Incremental Meta-Learning via Episodic Replay Distillation for Few-Shot Image Recognition" in CVPR 2022 3rd CLVISION continual learning workshop

ERD for IML code for our CVPRW 2022 paper Incremental Meta-Learning via Episodic Replay Distillation for Few-Shot Image Recognition by Kai Wang, Xiale

kai wang 2 Jun 12, 2022
Code for paper LAFITE: Towards Language-Free Training for Text-to-Image Generation (CVPR 2022)

Lafite Code for paper LAFITE: Towards Language-Free Training for Text-to-Image Generation (CVPR 2022) Update more details later. Requirements The impl

Yufan 110 Sep 19, 2022
Code and Data for CVPR 2022 paper: Compressive Single-Photon 3D Cameras

compressive-spad-lidar-cvpr22 Code and Data for our CVPR 2022 paper Compressive Single-Photon 3D Cameras. Project Page: https://pages.cs.wisc.edu/~fel

Felipe Gutierrez-Barragan 5 Sep 11, 2022
PyTorch code for CVPR 2022 paper Unbiased Teacher v2 Semi-supervised Object Detection for Anchor-free and Anchor-based Detectors

Unbiased Teacher v2: Semi-supervised Object Detection for Anchor-free and Anchor-based Detectors This is the PyTorch implementation of our paper: Unbi

Meta Research 45 Sep 21, 2022
Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (ATP).

Revisiting the "Video" in Video-Language Understanding Welcome to the official repo for our paper: Revisiting the "Video" in Video-Language Understand

Stanford Vision and Learning Lab 8 Sep 21, 2022
Source code for paper "A Two-Stage Graph-Based Method for Chinese AMR Parsing with Explicit Word Alignment" @ CAMRP-2022 & CCL-2022

两阶段中文AMR解析方法 中文 | English 论文 "A Two-Stage Graph-Based Method for Chinese AMR Parsing with Explicit Word Alignment" @ CAMRP-2022 & CCL-2022 的模型及训练代码。 我

Chen Liang 5 Sep 20, 2022
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

MEGVII Research 111 Sep 23, 2022
Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

LDL Paper | Supplementary Material Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution Jie Liang*, Hu

null 129 Sep 8, 2022
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning This is the official PyTorch implementation of our C

null 35 Sep 19, 2022
A PyTorch implementation of the CVPR 2022 Paper "Neural RGB-D Surface Reconstruction"

neural-rgbd-torch This project is a PyTorch implementation of Neural RGB-D Surface Reconstruction, which is a novel approach for 3D reconstruction tha

wanghy 17 Sep 15, 2022
Official Implement of CVPR 2022 paper 'Boosting Crowd Counting via Multifaceted Attention'

Boosting-Crowd-Counting-via-Multifaceted-Attention Official Implement of CVPR 2022 paper 'Boosting Crowd Counting via Multifaceted Attention' arxiv |

Lora 34 Sep 23, 2022
An official implementation of CVPR 2022 paper "Label Matching Semi-Supervised Object Detection".

MMDetection-based Toolbox for Semi-Supervised Object Detection Supported algorithms STAC:A Simple Semi-Supervised Learning Framework for Object Detect

Hikvision Research Institute 42 Sep 12, 2022