Code for ECCV2022 "Real-time Online Video Detection with Temporal Smoothing Transformers"

Related tags

Admin Panels TeSTra
Overview

TeSTra: Real-time Online Video Detection with Temporal Smoothing Transformers

Introduction

This is a PyTorch implementation for our ECCV 2022 paper "Real-time Online Video Detection with Temporal Smoothing Transformers".

teaser

Environment

  • The code is developed with CUDA 10.2, Python >= 3.7.7, PyTorch >= 1.7.1

    1. Clone the repo recursively.

      git clone --recursive [email protected]:zhaoyue-zephyrus/TeSTra.git
      
    2. [Optional but recommended] create a new conda environment.

      conda create -n testra python=3.7.7
      

      And activate the environment.

      conda activate testra
      
    3. Install the requirements

      pip install -r requirements.txt
      

Data Preparation

Pre-extracted Feature

You can directly download the pre-extracted feature (.zip) from the UTBox links below.

THUMOS'14

Description backbone pretrain UTBox Link
frame label N/A N/A link
RGB ResNet-50 Kinetics-400 link
Flow (TV-L1) BN-Inception Kinetics-400 link
Flow (NVOF) BN-Inception Kinetics-400 link
RGB ResNet-50 ANet v1.3 link
Flow (TV-L1) ResNet-50 ANet v1.3 link

EK100

Description backbone pretrain UTBox Link
action label N/A N/A link
noun label N/A N/A link
verb label N/A N/A link
RGB BN-Inception IN-1k + EK100 link
Flow (TV-L1) BN-Inception IN-1k + EK100 link
Object Faster-RCNN MS-COCO + EK55 link
  • Note: The features are converted from RULSTM to be compatible with the codebase.
  • Note: Object feature is not used in TeSTRa. The feature is uploaded for completeness only.

Once the zipped files are downloaded, you are suggested to unzip them and follow to file organization (see below).

(Alterative) Static links

It may be easier to download from static links via wget for non-GUI systems. To do so, simply change the utbox link from https://utexas.box.com/s/xxxx to https://utexas.box.com/shared/static/xxxx.zip. Unfortunately, UTBox does not support customized url names. Therfore, to wget while keeping the name readable, please refer to the bash scripts provided in DATASET.md.

(Alternative) Prepare dataset from scratch

You can also try to prepare the datasets from scratch by yourself.

THUMOS14

For TH14, please refer to LSTR.

EK100

For EK100, please find more details at RULSTM.

Computing Optical Flow

I will release a pure-python version of DenseFlow in the near future. Will post a cross-link here once done.

Data Structure

  1. If you want to use our dataloaders, please make sure to put the files as the following structure:

    • THUMOS'14 dataset:

      $YOUR_PATH_TO_THUMOS_DATASET
      ├── rgb_kinetics_resnet50/
      |   ├── video_validation_0000051.npy (of size L x 2048)
      │   ├── ...
      ├── flow_kinetics_bninception/
      |   ├── video_validation_0000051.npy (of size L x 1024)
      |   ├── ...
      ├── target_perframe/
      |   ├── video_validation_0000051.npy (of size L x 22)
      |   ├── ...
      
    • EK100 dataset:

      $YOUR_PATH_TO_EK_DATASET
      ├── rgb_kinetics_bninception/
      |   ├── P01_01.npy (of size L x 2048)
      │   ├── ...
      ├── flow_kinetics_bninception/
      |   ├── P01_01.npy (of size L x 2048)
      |   ├── ...
      ├── target_perframe/
      |   ├── P01_01.npy (of size L x 3807)
      |   ├── ...
      ├── noun_perframe/
      |   ├── P01_01.npy (of size L x 301)
      |   ├── ...
      ├── verb_perframe/
      |   ├── P01_01.npy (of size L x 98)
      |   ├── ...
      
  2. Create softlinks of datasets:

    cd TeSTra
    ln -s $YOUR_PATH_TO_THUMOS_DATASET data/THUMOS
    ln -s $YOUR_PATH_TO_EK_DATASET data/EK100
    

Training

The commands for training are as follows.

cd TeSTra/
python tools/train_net.py --config_file $PATH_TO_CONFIG_FILE --gpu $CUDA_VISIBLE_DEVICES
# Finetuning from a pretrained model
python tools/train_net.py --config_file $PATH_TO_CONFIG_FILE --gpu $CUDA_VISIBLE_DEVICES \
    MODEL.CHECKPOINT $PATH_TO_CHECKPOINT

Online Inference

For existing checkpoints, please refer to the next section.

Batch mode

Run the online inference in batch mode for performance benchmarking.

```
cd TeSTra/
# Online inference in batch mode
python tools/test_net.py --config_file $PATH_TO_CONFIG_FILE --gpu $CUDA_VISIBLE_DEVICES \
    MODEL.CHECKPOINT $PATH_TO_CHECKPOINT MODEL.LSTR.INFERENCE_MODE batch
```

Stream mode

Run the online inference in stream mode to calculate runtime in the streaming setting.

```
cd TeSTra/
# Online inference in stream mode
python tools/test_net.py --config_file $PATH_TO_CONFIG_FILE --gpu $CUDA_VISIBLE_DEVICES \
    MODEL.CHECKPOINT $PATH_TO_CHECKPOINT MODEL.LSTR.INFERENCE_MODE stream
# The above one will take quite long over the entire dataset,
# If you only want to look at a particular video, attach an additional argument:
python tools/test_net.py --config_file $PATH_TO_CONFIG_FILE --gpu $CUDA_VISIBLE_DEVICES \
    MODEL.CHECKPOINT $PATH_TO_CHECKPOINT MODEL.LSTR.INFERENCE_MODE stream \
    DATA.TEST_SESSION_SET "['$VIDEO_NAME']"
```

For more details on the difference between batch mode and stream mode, please check out LSTR.

Main Results and checkpoints

THUMOS14

method kernel type mAP (%) config checkpoint
LSTR (baseline) Cross Attention 69.9 yaml UTBox link
TeSTra Laplace (α=e^-λ=0.97) 70.8 yaml UTBox link
TeSTra Box (α=e^-λ=1.0) 71.2 yaml UTBox link
TeSTra (lite) Box (α=e^-λ=1.0) 67.3 yaml UTBox link

EK100

method kernel type verb (overall) noun (overall) action (overall) config checkpoint
TeSTra Laplace (α=e^-λ=0.9) 30.8 35.8 17.6 yaml UTBox link
TeSTra Box (α=e^-λ=1.0) 31.4 33.9 17.0 yaml UTBox link

Citations

If you are using the data/code/model provided here in a publication, please cite our paper:

@inproceedings{zhao2022testra,
	title={Real-time Online Video Detection with Temporal Smoothing Transformers},
	author={Zhao, Yue and Kr{\"a}henb{\"u}hl, Philipp},
	booktitle={European Conference on Computer Vision (ECCV)},
	year={2022}
}

Contacts

For any question, feel free to raise an issue or drop me an email via yzhao [at] cs.utexas.edu

License

This project is licensed under the Apache-2.0 License.

Acknowledgements

This codebase is built upon LSTR.

The code snippet for evaluation on EK100 is borrowed from RULSTM.

Also, thanks to Mingze Xu for assistance to reproduce the feature on THUMOS'14.

You might also like...

[ECCV2022] Source Code for "Improving GANs for Long-Tailed Data through Group Spectral Regularization"

[ECCV2022] Source Code for

Group Spectral Regularization for GANS This repository contains code for the paper: Improving GANs for Long-Tailed Data through Group Spectral Regular

Sep 15, 2022

Official Implementation of ECCV2022 paper "OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers"

Official Implementation of ECCV2022 paper

OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers (ECCV 2022) Official Implementation of "OSFormer: One-Stage Camouflaged Instan

Sep 17, 2022

[ECCV2022] Learning Ego 3D Representation as Ray Tracing

[ECCV2022] Learning Ego 3D Representation as Ray Tracing

Learning Ego 3D Representation as Ray Tracing Website | Paper Learning Ego 3D Representation as Ray Tracing, Jiachen Lu, Zheyuan Zhou, Xiatian Zhu, Ha

Sep 23, 2022

[ECCV2022] Official Implementation of paper "V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer"

[ECCV2022] Official Implementation of  paper

V2X-ViT This is the official implementation of ECCV2022 paper "V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer". Install

Sep 22, 2022

ECCV2022: United Defocus Blur Detection and Deblurring via Adversarial Promoting Learning

APL in PyTorch Implementation of "United Defocus Blur Detection and Deblurring via Adversarial Promoting Learning" in PyTorch Datasets train_data: 120

Jul 8, 2022

[ECCV2022] Official PyTorch implementation of the paper "Outpainting by Queries"

QueryOTR Outpainting by Queries, ECCV 2022. ArXiv we propose a novel hybrid vision-transformer-based encoder-decoder framework, named Query Outpaintin

Sep 16, 2022

ECCV2022 Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection

ECCV2022 Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection

Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection by Xubin Zhong, Changxing Ding, Zijian Li and Shaoli Huang. This

Aug 10, 2022

[ECCV2022] Learning Quality-aware Dynamic Memory for Video Object Segmentation

[ECCV2022] Learning Quality-aware Dynamic Memory for Video Object Segmentation

Learning Quality-aware Dynamic Memory for Video Object Segmentation ECCV 2022 Abstract Previous memory-based methods mainly focus on better matching b

Sep 6, 2022

ECCV2022,Bootstrapped Masked Autoencoders for Vision BERT Pretraining

ECCV2022,Bootstrapped Masked Autoencoders for Vision BERT Pretraining

BootMAE, ECCV2022 This repo is the official implementation of "Bootstrapped Masked Autoencoders for Vision BERT Pretraining". Introduction We propose

Aug 29, 2022
Owner
Yue Zhao
A computer visionary.
Yue Zhao
[ECCV2022] PyTorch code for SeqDeepFake: Detecting and Recovering Sequential DeepFake Manipulation

SeqDeepFake: Detecting and Recovering Sequential DeepFake Manipulation Rui Shao, Tianxing Wu, Ziwei Liu S-Lab, Nanyang Technological University  [Proj

Rui Shao 70 Sep 22, 2022
This repository is the official code of the ECCV2022 paper "Global Spectral Filter Memory Network for Video Object Segmentation"

Global Spectral Filter Memory Network for Video Object Segmentation ECCV 2022 Abstract This paper studies semi-supervised video object segmentation th

liuyong 17 Aug 27, 2022
The Code of ECCV2022:Enhanced Accuracy and Robustness via Multi-Teacher Adversarial Distillation

ECCV2022: Enhanced Accuracy and Robustness via Multi-Teacher Adversarial Distillation The Code of ECCV2022: Enhanced Accuracy and Robustness via Multi

null 7 Sep 21, 2022
The code for ECCV2022 (Watermark Vaccine: Adversarial Attacks to Prevent Watermark Removal)

Watermark Vaccine The code for ECCV2022 (Watermark Vaccine: Adversarial Attacks to Prevent Watermark Removal) Introduction The code is the implementat

Thinwayliu 10 Sep 21, 2022
Official code for ECCV2022 paper: Learning Series-Parallel Lookup Tables for Efficient Image Super-Resolution

SPLUT Official code for ECCV2022 paper: Learning Series-Parallel Lookup Tables for Efficient Image Super-Resolution The folder training_testing_code c

null 23 Sep 24, 2022
Code of ECCV2022 paper "Inverted Pyramid Multi-task Transformer for Dense Scene Understanding"

?? ECCV2022 InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding ?? Introduction This repository implements our ECCV2022 paper

Hanrong Ye 36 Sep 20, 2022
Official Code for ECCV2022: Learning Semantic Correspondence with Sparse Annotations

Learning Semantic Correspondence with Sparse Annotations (To be appeared at ECCV'22) For more information, check out the paper on [arXiv] (TODO). Pret

Shuaiyi Huang 14 Aug 25, 2022
Code for ECCV2022 paper 'KD-MVS: Knowledge Distillation Based Self-supervised Learning for Multi-view Stereo'

KD-MVS: Knowledge Distillation Based Self-supervised Learning for Multi-view Stereo Paper | Project Page | Data | Checkpoints Installation Clone this

MEGVII Research 17 Sep 13, 2022
The official code of WaveGAN: Frequency-aware GAN for High-Fidelity Few-shot Image Generation (ECCV2022)

Official codes for WaveGAN: An Frequency-aware GAN for High-Fidelity Few-shot Image Generation (ECCV2022) [Paper] Requirements imageio==2.9.0 lmdb==1.

Master of Science (Just call me 科大侠) 41 Sep 12, 2022
Code for ECCV2022 Paper "Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection"

Body-Part Map for Interactiveness This repo contains the official implementation of our paper: Mining Cross-Person Cues for Body-Part Interactiveness

Xiaoqian Wu 17 Sep 14, 2022