(CVPR 2022) Text Spotting Transformers

Related tags

Admin Panels TESTR
Overview

TESTR: Text Spotting Transformers

This repository is the official implementations for the following paper:

Text Spotting Transformers

Xiang Zhang, Yongwen Su, Subarna Tripathi, and Zhuowen Tu, CVPR 2022

Getting Started

We use the following environment in our experiments. It's recommended to install the dependencies via Anaconda

  • CUDA 11.3
  • Python 3.8
  • PyTorch 1.10.1
  • Official Pre-Built Detectron2

Installation

Please refer to the Installation section of AdelaiDet: README.md.

If you have not installed Detectron2, following the official guide: INSTALL.md.

After that, build this repository with

python setup.py build develop

Preparing Datasets

Please download TotalText, CTW1500, MLT, and CurvedSynText150k according to the guide provided by AdelaiDet: README.md.

ICDAR2015 dataset can be download via link.

Extract all the datasets and make sure you organize them as follows

- datasets
  | - CTW1500
  |   | - annotations
  |   | - ctwtest_text_image
  |   | - ctwtrain_text_image
  | - totaltext (or icdar2015)
  |   | - test_images
  |   | - train_images
  |   | - test.json
  |   | - train.json
  | - mlt2017 (or syntext1, syntext2)
      | - annotations
      | - images

After that, download polygonal annotations, along with evaluation files and extract them under datasets folder.

Visualization Demo

You can try to visualize the predictions of the network using the following command:

python demo/demo.py --config-file <PATH_TO_CONFIG_FILE> --input <FOLDER_TO_INTPUT_IMAGES> --output <OUTPUT_FOLDER> --opts MODEL.WEIGHTS <PATH_TO_MODEL_FILE> MODEL.TRANSFORMER.INFERENCE_TH_TEST 0.3

You may want to adjust INFERENCE_TH_TEST to filter out predictions with lower scores.

Training

You can train from scratch or finetune the model by putting pretrained weights in weights folder.

Example commands:

python tools/train_net.py --config-file <PATH_TO_CONFIG_FILE> --num-gpus 8

All configuration files can be found in configs/TESTR, excluding those files named Base-xxxx.yaml.

TESTR_R_50.yaml is the config for TESTR-Bezier, while TESTR_R_50_Polygon.yaml is for TESTR-Polygon.

Evaluation

python tools/train_net.py --config-file <PATH_TO_CONFIG_FILE> --eval-only MODEL.WEIGHTS <PATH_TO_MODEL_FILE>

Pretrained Models

Dataset Annotation Type Lexicon Det-P Det-R Det-F E2E-P E2E-R E2E-F Link
Pretrain Bezier None 88.87 76.47 82.20 63.58 56.92 60.06 OneDrive
Polygonal None 88.18 77.51 82.50 66.19 61.14 63.57 OneDrive
TotalText Bezier None 92.83 83.65 88.00 74.26 69.05 71.56 OneDrive
Full - - - 86.42 80.35 83.28
Polygonal None 93.36 81.35 86.94 76.85 69.98 73.25 OneDrive
Full - - - 88.00 80.13 83.88
CTW1500 Bezier None 89.71 83.07 86.27 55.44 51.34 53.31 OneDrive
Full - - - 83.05 76.90 79.85
Polygonal None 92.04 82.63 87.08 59.14 53.09 55.95 OneDrive
Full - - - 86.16 77.34 81.51
ICDAR15 Polygonal None 90.31 89.70 90.00 65.49 65.05 65.27 OneDrive
Strong - - - 87.11 83.29 85.16
Weak - - - 80.36 78.38 79.36
Generic - - - 73.82 73.33 73.57

The Lite models only use the image feature from the last stage of ResNet.

Method Annotation Type Lexicon Det-P Det-R Det-F E2E-P E2E-R E2E-F Link
Pretrain (Lite) Polygonal None 90.28 72.58 80.47 59.49 50.22 54.46 OneDrive
TotalText (Lite) Polygonal None 92.16 79.09 85.12 66.42 59.06 62.52 OneDrive

Citation

@misc{zhang2022text,
      title={Text Spotting Transformers}, 
      author={Xiang Zhang and Yongwen Su and Subarna Tripathi and Zhuowen Tu},
      year={2022},
      eprint={2204.01918},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This repository is released under the Apache License 2.0. License can be found in LICENSE file.

Acknowledgement

Thanks to AdelaiDet for a standardized training and inference framework, and Deformable-DETR for the implementation of multi-scale deformable cross-attention.

Comments
  • Training from scratch and got no results

    Training from scratch and got no results

    Thanks for making this great work open source.

    but I have trained 255K steps from scratch , and the totaltext hmean is always 0. (Use 8 GPUS,Pretrain/testr_r50.yaml and all the parameters are default)

    opened by wulitaotao1 9
  • demo using problems(使用demo时的问题)

    demo using problems(使用demo时的问题)

    I use the command “python demo/demo.py --config-file E:/TESTR-main/configs/TESTR/ICDAR15/TESTR_R_50_Polygon.yaml --input E:/TESTR-main/dataset/icdar2015/test_images/img_204.jpg --output E:/TESTR-main/output/ --opts MODEL.WEIGHTS E:/TESTR-main/models/icdar15_testr_R_50_polygon.pth MODEL.TRANSFORMER.INFERENCE_TH_TEST 0.3” to run the demo(that all are my private path)

    but the problem always exit : KeyError: 'Non-existent config key: MODEL.TRANSFORMER'

    and I don't konw which steps were wrong. If you can help me,I will be very Grateful.

    作者您好,我按readme的步骤配置了相关环境,使用demo时一直报:KeyError: 'Non-existent config key: MODEL.TRANSFORMER' 经过一步一步排查,发现函数def _merge_a_into_b(a, b, root, key_list):里面的b没有MODEL.TRANSFORMER项,请问是我config配置错误还是环境安装错误亦或是其他我不知道的错误。

    opened by lvyongting 2
  • pretrain problems

    pretrain problems

    Hello, I was trying to do pretrain TESTR on my own machine, but it seems that when I train the model and evaluation per 20,000 iterations the evaluation result is always 0. Moreover, when I adopt the pretrain weights and finetune TESTR on my machine, it works well. For the config files (../configs/TESTR/Pretrain/TESTR_R_50.yaml), I only adjusted the batch size from 8 to 2 and trained the model on 2GPUs. I hope to get your reply, thank you!

    opened by CarlBhy 2
  • Inference problem

    Inference problem

    Hello, I was trying to do inference using your model. The problem is the weights are in a directory called weights which don't exist in the repository as it's included in the .gitignore file. So, please upload those weights if you may.

    opened by Abdelrahman350 1
  • About the CTW1500 annotation file

    About the CTW1500 annotation file

    Thanks to your wonderful work. ann_keys = ["iscrowd", "bbox", "rec", "category_id"] + (extra_annotation_keys or []) I don't know what is the meaning of "rec".

    opened by tyxy2310 1
  • error during demo using video

    error during demo using video

    Hi, Great work!

    I tried doing inference using images and it worked well, but during the inference using video, I got this error.

    image

    Any thoughts on how to get through it?

    opened by maherr13 1
  • Shape is invalid for input size

    Shape is invalid for input size

    Hi everyone I'm facing with this problem when I try to run train_net.py: image

    What is testr.num_ctrl_points? why it is 16? I hope to get some suggestions. Thank you a lot.

    opened by hao3830 1
  • Licenses

    Licenses

    Hi! Thank you for your great work. I've seen that TESTR is available under the Apache License 2.0 what is great for commercial usage but it seems to be strongly based on AdelaiDet which is supposed to be not available for commercial usage (see github). Do you think there are some issues with the licenses for commercial usage? Bests, Stephan

    opened by JaegerStephan 0
  • How to change character set?

    How to change character set?

    Hi, I'd like to train with different language datasets, such as Chinese, Korean, and Japanese, so I have to change the character set rather than the default setting.

    Some detectron based models give character set configuration, but I can't find it here. Can you guide me on how to change the character set?

    opened by jeong-tae 12
  • How did you convert original annotations to COCO Poly format?

    How did you convert original annotations to COCO Poly format?

    Thank you for your wonderful works and excellent repository. Based on the document, I was able to reproduce the result. But how did you convert the original annotation to the converted polygonal annotations? Is there any converting script?

    Thank you.

    opened by Swall0w 2
Owner
mlpc-ucsd
mlpc-ucsd
Official implementation for "GLASS: Global to Local Attention for Scene-Text Spotting" (ECCV'22)

GLASS: Global to Local Attention for Scene-Text Spotting This is a PyTorch implementation of the following paper: GLASS: Global to Local Attention for

null 65 Dec 14, 2022
Code for CVPR 2022 CLEAR Challenge "This repository is the CLEAR Challenge 1st place methods for CVPR 2022 Workshop on Visual Perception and Learning in an Open World"

CLEAR | Starter Kit This repository is the CLEAR Challenge 1st place methods for CVPR 2022 Workshop on Visual Perception and Learning in an Open World

Tencent YouTu Research 5 Sep 9, 2022
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

Antoine Yang 108 Dec 27, 2022
Global Tracking Transformers, CVPR 2022

Global Tracking Transformers Global Tracking Transformers, Xingyi Zhou, Tianwei Yin, Vladlen Koltun, Philipp Krähenbühl, CVPR 2022 (arXiv 2203.13250)

Xingyi Zhou 304 Dec 16, 2022
[CVPR 2022] FaceFormer: Speech-Driven 3D Facial Animation with Transformers

FaceFormer PyTorch implementation for the paper: FaceFormer: Speech-Driven 3D Facial Animation with Transformers, CVPR 2022. Yingruo Fan, Zhaojiang Li

Evelyn 329 Jan 1, 2023
[CVPR 2022] Aesthetic Text Logo Synthesis via Content-aware Layout Inferring

TextLogoLayout This is the official Pytorch implementation of the paper: Aesthetic Text Logo Synthesis via Content-aware Layout Inferring. CVPR 2022.

Yizhi Wang 181 Dec 25, 2022
Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

PPE ✨ Repository for our CVPR'2022 paper: Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-

Zipeng Xu 34 Nov 28, 2022
Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.

GroupViT: Semantic Segmentation Emerges from Text Supervision GroupViT is a framework for learning semantic segmentation purely from text captions wit

NVIDIA Research Projects 511 Jan 7, 2023
Code for paper LAFITE: Towards Language-Free Training for Text-to-Image Generation (CVPR 2022)

Lafite Code for paper LAFITE: Towards Language-Free Training for Text-to-Image Generation (CVPR 2022) Update more details later. Requirements The impl

Yufan 130 Dec 16, 2022
How to export Hugging Face's 🤗 NLP Transformers models to ONNX and use the exported model with the appropriate Transformers pipeline.

How to export Hugging Face's ?? NLP Transformers models to ONNX and use the exported model with the appropriate Transformers pipeline.

Thomas Chaigneau 14 Dec 22, 2022
Exploration on Micro Transformers, Unleash the power of mini-transformers!

Mini Transformers This is mainly for exploring on tiny transformers arch, experiement how to get a small while still powerful transformer architecture

JinTian 5 Sep 29, 2022
This repository contains the code used for distillation and fine-tuning of compact biomedical transformers that have been introduced in the paper "On The Effectiveness of Compact Biomedical Transformers"

Compact Biomedical Transformers This repository contains the code used for distillation and fine-tuning of compact biomedical transformers that have b

NLPie Research 6 Nov 8, 2022
[SIGIR 2022] CenterCLIP: Token Clustering for Efficient Text-Video Retrieval. Also, a text-video retrieval toolbox based on CLIP + fast pyav video decoding.

CenterCLIP CenterCLIP achieves state-of-the-art text-video retrieval performance and decent computation cost reduction on MSVD, MSRVTT, LSMDC, and Act

Shuai Zhao 76 Dec 26, 2022
The official repository for Efficient Long-Text Understanding Using Short-Text Models (Ivgi et al., 2022) paper

SLED The official repository for Efficient Long-Text Understanding Using Short-Text Models (Ivgi et al., 2022). SLED models use pretrained, short-rang

Maor Ivgi 38 Dec 26, 2022
This repository implements a prompt tuning model for hierarchical text classification. This work has been accepted as the long paper "HPT: Hierarchy-aware Prompt Tuning for Hierarchical Text Classification" in EMNLP 2022.

Implement of HPT: Hierarchy-aware Prompt Tuning for Hierarchical Text Classification This repository implements a prompt tuning model for hierarchical

Wang Zihan 30 Dec 21, 2022
Generates human-like text using OpenAI GPT-3 via a text-in, text-out API.

Gpt3TextGeneration Generate human-like text using OpenAI GPT-3 via a text-in, text-out API. Overview GPT-3 is the first-ever generalized language mode

Shubham Saboo 5 Dec 15, 2022
Code for "MetaMorph: Learning Universal Controllers with Transformers", Gupta et al, ICLR 2022

MetaMorph: Learning Universal Controllers with Transformers This is the code for the paper MetaMorph: Learning Universal Controllers with Transformers

Agrim Gupta 50 Jan 3, 2023
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

Phil Wang 364 Jan 6, 2023
Pytorch implementation of "Block Recurrent Transformers" (Hutchins & Schlag et al., 2022)

Block Recurrent Transformer A PyTorch implementation of Hutchins & Schlag et al.. Owes very much to Phil Wang's x-transformers. Very much in-progress.

Dashiell Stander 63 Jan 3, 2023