Next-generation Video instance recognition framework on top of Detectron2 which supports SeqFormer(ECCV Oral) and IDOL(ECCV Oral))

Overview

VNext:

  • VNext is a Next-generation Video instance recognition framework on top of Detectron2.
  • Currently it provides advanced online and offline video instance segmentation algorithms.
  • We will continue to update and improve it to provide a unified and efficient framework for the field of video instance recognition to nourish this field.

To date, VNext contains the official implementation of the following algorithms:

IDOL: In Defense of Online Models for Video Instance Segmentation (ECCV2022 Oral)

SeqFormer: Sequential Transformer for Video Instance Segmentation (ECCV2022 Oral)

Highlight:

  • IDOL is accepted to ECCV 2022 as an oral presentation!
  • SeqFormer is accepted to ECCV 2022 as an oral presentation!
  • IDOL won first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022).

Getting started

  1. For Installation and data preparation, please refer to to INSTALL.md for more details.

  2. For IDOL training, evaluation, and model zoo, please refer to IDOL.md

  3. For SeqFormer training, evaluation and model zoo, please refer to SeqFormer.md

IDOL

PWC PWC PWC

In Defense of Online Models for Video Instance Segmentation

Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai

Introduction

  • In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models are usually inferior to the contemporaneous offline models by over 10 AP, which is a huge drawback.

  • By dissecting current online models and offline models, we demonstrate that the main cause of the performance gap is the error-prone association and propose IDOL, which outperforms all online and offline methods on three benchmarks.

  • IDOL won first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022).

Visualization results on OVIS valid set

Quantitative results

YouTube-VIS 2019

OVIS 2021

SeqFormer

PWC

SeqFormer: Sequential Transformer for Video Instance Segmentation

Junfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, Xiang Bai

Introduction

  • SeqFormer locates an instance in each frame and aggregates temporal information to learn a powerful representation of a video-level instance, which is used to predict the mask sequences on each frame dynamically.

  • SeqFormer is a robust, accurate, neat offline model and instance tracking is achieved naturally without tracking branches or post-processing.

Visualization results on YouTube-VIS 2019 valid set

Quantitative results

YouTube-VIS 2019

YouTube-VIS 2021

Citation

@inproceedings{seqformer,
  title={SeqFormer: Sequential Transformer for Video Instance Segmentation},
  author={Wu, Junfeng and Jiang, Yi and Bai, Song and Zhang, Wenqing and Bai, Xiang},
  booktitle={ECCV},
  year={2022},
}

@inproceedings{IDOL,
  title={In Defense of Online Models for Video Instance Segmentation},
  author={Wu, Junfeng and Liu, Qihao and Jiang, Yi and Bai, Song and Yuille, Alan and Bai, Xiang},
  booktitle={ECCV},
  year={2022},
}

Acknowledgement

This repo is based on detectron2, Deformable DETR, VisTR, and IFC Thanks for their wonderful works.

Comments
  • Setting MAX_SIZE_TEST

    Setting MAX_SIZE_TEST

    The original SeqFormer repository (and most previous methods) limit the test time resolution. However, for this version of SeqFormer and IDOL you did not set INPUT.MAX_SIZE_TEST. Was this intentionally? For SeqFormer, the README.md contains the tables produced by the old repository. By not limiting the test time resolution the results of this version should be better, right? It should be noted, that this issue prevents a fair comparison with previous methods. Or am I missing something?

    opened by timmeinhardt 6
  • OVIS inference video sequence too long ??

    OVIS inference video sequence too long ??

    Hi, i try reproduce paper about OVIS performance. But i also get crash when run inference about OVIS..

     May be sequence too long ???  about 200 frames..
    
    opened by CarlHuangNuc 6
  • Why OVIS need so many Memory??

    Why OVIS need so many Memory??

    I try to inference on a hardware with 512G memory, some video preprocess waste all memory and crask .

    When I change input size from 720 --> 100, can run sucessfully.

    opened by CarlHuangNuc 3
  • troubles when reproducing

    troubles when reproducing

    Hi, thanks for the wonderful work.

    But I had some troubles when trying to reproduce the results with command:

    python3 projects/IDOL/train_net.py --config-file projects/IDOL/configs/ytvis19_r50.yaml --num-gpus 8 MODEL.WEIGHTS projects/IDOL/weights/cocopretrain_R50.pth SOLVER.IMS_PER_BATCH 16

    The results is 46.96, which is lower than provided results 49.5. I'm using Torch 1.9.0, and batchsize was set to 16 instead of 32.

    Is there something I missed? Looking forward to your reply.

    opened by HanGuangXin 3
  • gpu util = 0 while running inference

    gpu util = 0 while running inference

    Hello, appreciate for your great job!

    There is a problem when I run your inference script on ytvis dataset python3 projects/IDOL/train_net.py --config-file projects/IDOL/configs/XXX.yaml --num-gpus 8 --eval-only MODEL.WEIGHTS /path to my .pth

    Everything works well and GPU memory looks normal, but GPU util is always 0. image

    Total infer time is around 2h, I don't know if all gpus are correctly used during inference.

    opened by shulou1996 2
  • the definition of proposed model

    the definition of proposed model

    Thanks for releasing the code. Where can we find the pytorch model definition of IDOL? The IDOL is now presented by config file and Vnext library. I cannot find pytorch model definition code of IDOL.

    opened by RaymondWang987 2
  • demo running IDOL

    demo running IDOL

    Hi, thanks for the great work. Is it possible to tun the demo script over the webcam using the IDOL or the SeqFormer models? if yes, which inputs (e.g. config file) do I have to put in the demo.py command?

    Thanks a lot

    opened by andreazuna89 1
  • about SeqFormer/train_net.py

    about SeqFormer/train_net.py

    When i used your train_net.py for seqformer, they report errors for:

    from detectron2.projects.seqformer import add_seqformer_config, build_detection_train_loader, build_detection_test_loader from detectron2.projects.seqformer.data import ( YTVISDatasetMapper, YTVISEvaluator, get_detection_dataset_dicts,DetrDatasetMapper,)

    I think the import path may not be correct.

    opened by lywang76 1
  • Can you provide a training log of one IDOL model?

    Can you provide a training log of one IDOL model?

    Hi,IDOL is an excellent paper, with many wonderful designs. Unfortunately, due to resource constraints, I can't fully train it. Can you provide a training log of one IDOL model on the Youtube-VIS dataset? Thanks.

    opened by Cynthiacoding 1
  • RuntimeError: Timed out waiting 1800000ms for send operation to complete

    RuntimeError: Timed out waiting 1800000ms for send operation to complete

    Hi! Thanks for the excellent work.

    I got the Error when running IDOL with multi-processes on a single node

    python projects/IDOL/train_net.py --config-file projects/IDOL/configs/ytvis19_r50.yaml --num-gpus 8
    

    Error:

    RuntimeError: [/opt/conda/conda-bld/pytorch_1656352430114/work/third_party/gloo/gloo/transport/tcp/unbound_buffer.cc:136] Timed out waiting 1800000ms for send operation to complete
    

    The program is stuck on some iteration and one of the 8 GPUs is idle and its util is 0.

    Do you know what is going on? and how to fix it?

    Thanks!

    opened by ywwwei 1
  • Why Detectron2 and not MMdet (OpenMMLab projects .etc)?

    Why Detectron2 and not MMdet (OpenMMLab projects .etc)?

    首先非常感谢大佬提供这么一个仓库,我相信一定会对未来的VIS研究带来很大的便利以及统一的比较框架。但是,为什么采用detectron2,而不使用更加健全好用的openmmlab的仓库呢,这两个仓库都用过,detectron2使用起来比较晦涩,远不如mmdet,而且我本人也有想写一个基于mmdet的算法库。

    opened by yingkaining 1
  • How to padding the non-bbox?

    How to padding the non-bbox?

    the number of instances must be the same in one video? if not how to padding the non-bbox? targets_for_clip_prediction.append({"labels": torch.stack(clip_classes,dim=0).max(0)[0], "boxes": torch.stack(clip_boxes,dim=1), # [num_inst,num_frame,4] 'masks': torch.stack(clip_masks,dim=1), # [num_inst,num_frame,H,W] 'size': torch.as_tensor([h, w], dtype=torch.long, device=self.device), # 'inst_id':inst_ids, # 'valid':valid_id })

    opened by Rhythmczy 0
  • A bug when evaluation or train on ovis dataset.

    A bug when evaluation or train on ovis dataset.

    When I try to run python projects/IDOL/train_net.py --config-file projects/IDOL/configs/ovis_r50.yaml --num-gpus 1 --eval-only an error occured for ann in self.dataset['annotations']: TypeError: 'NoneType' object is not iterable The reason for this problem is the ovis datasets annotations file has an additional items "annotations": null ,you need delete it manual. I suggest authors add a tips in README.md.

    opened by xb534 0
  • Empty yvtis_2019 testing results

    Empty yvtis_2019 testing results

    I test SeqFormer model with res = Trainer.test(cfg, model).

    And the output is empty, and I debug it. And found the followings from seqformer.py line 238. {'image_size': (720, 1280), 'pred_scores': [], 'pred_labels': [], 'pred_masks': []} {'image_size': (720, 1280), 'pred_scores': [], 'pred_labels': [], 'pred_masks': []} {'image_size': (720, 1280), 'pred_scores': [], 'pred_labels': [], 'pred_masks': []} {'image_size': (720, 1280), 'pred_scores': [], 'pred_labels': [], 'pred_masks': []} {'image_size': (720, 1280), 'pred_scores': [], 'pred_labels': [], 'pred_masks': []}

    opened by lywang76 0
  • Cannot reproduce the same mAP_L result on the Youtube-VIS 2022 validation set

    Cannot reproduce the same mAP_L result on the Youtube-VIS 2022 validation set

    Hello! I trained IDOL using default swinL config yaml file, which only changes dataset from 19 to 21 and evaluate on Youtube-VIS 2022 validation set. I got nearly the same mAP_S but different mAP_L around 44, which is much lower than your mAP_L result 48.4 in your 1st solution working paper. Is there any problem? Thank you very much.

    opened by shulou1996 1
  • Evaluate Module

    Evaluate Module

    Hi,Dr.Wu,thanks for your amazing work and open-source code!But when I inferedthe model, I didn't find the evaluation function code in the ytvis_eval.py. Could you please provide the evaluation function code for inferring?

    opened by lirui-9527 0
Owner
Junfeng Wu
PhD student, Huazhong University of Science and Technology, Computer Vision
Junfeng Wu
Next-generation firewall (NGFW) that supports blocking SocialClub Overlay notifications.

SocialClub Notification Blocker Dutch | English | French | Romanian | Russian | Spanish 8th of March, 2022 UPDATE: Rockstar has patched the current sp

null 84 Aug 23, 2022
This generates a playlist for you based on your preferred genre, most recently listened music, top listened to artists, top tracks, top albums.

spotifyPlaylistGenerator This generates a playlist for you based on your preferred genre, most recently listened music, top listened to artists, top t

null 1 May 30, 2022
Pattern Recognition (Journal): Residual Objectness for Object Detection, implemented by detectron2

ResObj in Detectron2 Install # e.g., pytorch + cuda 11.6, detectron2 conda install pytorch torchvision cudatoolkit=11.6 -c pytorch -c conda-forge pip

Joya Chen 1 Aug 10, 2022
The official code of CornerTransformer (ECCV 2022, Oral) on top of MMOCR.

Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition (ECCV 2022 Oral) The official code of CornerTransformer (ECCV 2022,

Xudong Xie 68 Sep 20, 2022
The official implementation of Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation.

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation Introduction This repo is the official implementation of IAI paradigm

null 11 Sep 19, 2022
A modern `Json-Rpc` implementation, compatible with `Json-Rpc 2.0` and `Json-Rpc X`, supports multiple network protocols and backend frameworks and supports bidirectional calls.

Lacia A modern Json-Rpc implementation, compatible with Json-Rpc 2.0 and Json-Rpc X, supports multiple network protocols and backend frameworks and su

luxuncang 5 Jul 25, 2022
Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022 Oral

Temporally Efficient Vision Transformer for Video Instance Segmentation Temporally Efficient Vision Transformer for Video Instance Segmentation (CVPR

Hust Visual Learning Team 189 Sep 14, 2022
It allows you to convert Detectron2 prediction outputs to CVAT backup zip.

Introduction If you have a ready model and you want to train a new model, you can have the model do some of the labeling. That, you can save time from

Yunus Erdem AKPINAR 1 Jun 15, 2022
The official implementation of the ECCV 2022 Oral paper: RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos

[ECCV 2022 Oral] RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos (Paper) Yunhui Han1, Kunming Luo2, Ao Luo2, Jiangyu Liu2, Ha

MEGVII Research 38 Sep 16, 2022
Official Pytorch Implementation for "Text2LIVE: Text-Driven Layered Image and Video Editing" (ECCV 2022 Oral)

Text2LIVE: Text-Driven Layered Image and Video Editing (ECCV 2022 - Oral) [Project Page] Text2LIVE is a method for editing a real-world image/video, u

Omer Bar Tal 137 Sep 19, 2022
SparseInst: Sparse Instance Activation for Real-Time Instance Segmentation, CVPR 2022

SparseInst ?? A simple framework for real-time instance segmentation, CVPR 2022 by Tianheng Cheng, Xinggang Wang†, Shaoyu Chen, Wenqiang Zhang, Qian Z

Hust Visual Learning Team 385 Sep 24, 2022
A rock, paper, scissors "AI" which predicts the user's next move from a binomial distribution of the frequency of previous moves; with a CSV player database

Rock Paper Scissors "AI" Revision for 0.0.1 List of contents: Brief Introduction Instructions Console Example Legal (MIT license) Brief Introduction A

Ivan 1 Apr 24, 2022
Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (ATP).

Revisiting the "Video" in Video-Language Understanding Welcome to the official repo for our paper: Revisiting the "Video" in Video-Language Understand

Stanford Vision and Learning Lab 8 Sep 21, 2022
Stable Diffusion Video to Video, Image to Image, Template Prompt Generation system and more, for use with any stable diffusion model

SDUtils: Stable Diffusion Utility Wrapper Stable Diffusion General utilities wrapper including: Video to Video, Image to Image, Template Prompt Genera

null 10 Sep 18, 2022
Video Graph Transformer for Video Question Answering (ECCV'22)

VGT This is the pytorch implementation of our paper accepted to ECCV'22: Video Graph Transformer for Video Question Answering Environment Assume you h

Sea AI Lab 11 Sep 16, 2022
A real-time chatbot application has been made which supports 24x7 customer services with the help of natural language processing and machine learning algorithms.

Real-Time-Chatbot-Assistant-with-GUI What is a chatbot? At the most basic level, a chatbot is a computer program that simulates and processes human co

Durgesh Rao 1 Jun 23, 2022
Omark is implements a linear search algorithm on a facial recognition model, its captures a classroom and determines which student is in class and which student isn't

Omark Omark is the new form of taking attendance in class, its captures a classroom and determines which student is in class and which student isn't O

Ifechukwudeni Oweh 13 Jul 17, 2022
Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch

these fireworks do not exist Video Diffusion - Pytorch (wip) Text to video, it is happening! Official Project Page Implementation of Video Diffusion M

Phil Wang 392 Sep 27, 2022
A Simple Python Text To Speech Converter Which Supports Voice Types .

from github import readme def project(self): self.name = 'Text To Speech' self.description = 'A Simple Python Text To Speech Converator With

ishandev1337 2 Aug 15, 2022