Benchmarking Attention Mechanism in Vision Transformers.

Overview

Vision Transformer Attention Benchmark

This repo is a collection of attention mechanisms in vision Transformers. Beside the re-implementation, it provides a benchmark on model parameters, FLOPs and CPU/GPU throughput.

Requirements

  • Pytorch 1.8+
  • timm
  • ninja
  • einops
  • fvcore
  • matplotlib

Testing Environment

  • NVIDIA RTX 3090
  • Intel® Core™ i9-10900X CPU @ 3.70GHz
  • Memory 32GB
  • Ubuntu 22.04
  • PyTorch 1.8.1 + CUDA 11.1

Setting

  • input: 14 x 14 = 196 tokens (1/16 scale feature maps in common ImageNet-1K training)
  • batch size for speed testing (images/s): 64
  • embedding dimension:768
  • number of heads: 12

Testing

For example, to test HiLo attention,

cd attentions/
python hilo.py

By default, the script will test models on both CPU and GPU. FLOPs is measured by fvcore. You may want to edit the source file as needed.

Outputs:

Number of Params: 2.2 M
FLOPs = 298.3 M
throughput averaged with 30 times
batch_size 64 throughput on CPU 1029
throughput averaged with 30 times
batch_size 64 throughput on GPU 5104

Supported Attentions

  • MSA: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. [Paper] [Code]
  • Cross Window: CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows. [Paper] [Code]
  • DAT: Vision Transformer with Deformable Attention. [Paper] [Code]
  • Performer: Rethinking Attention with Performers. [Paper] [Code]
  • Linformer: Linformer: Self-Attention with Linear Complexity. [Paper] [Code]
  • SRA: Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. [Paper] [Code]
  • Local/Shifted Window: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. [Paper] [Code]
  • Focal: Focal Self-attention for Local-Global Interactions in Vision Transformers. [Paper] [Code]
  • XCA: XCiT: Cross-Covariance Image Transformers. [Paper] [Code]
  • QuadTree: QuadTree Attention for Vision Transformers. [Paper] [Code]
  • VAN: Visual Attention Network. [Paper] [Code]
  • HorNet: HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions. [Paper] [Code]
  • HiLo: Fast Vision Transformers with HiLo Attention. [Paper] [Code]

Single Attention Layer Benchmark

Name Params (M) FLOPs (M) CPU Speed GPU Speed Demo
MSA 2.36 521.43 505 4403 msa.py
Cross Window 2.37 493.28 325 4334 cross_window.py
DAT 2.38 528.69 223 3074 dat.py
Performer 2.36 617.24 181 3180 performer.py
Linformer 2.46 616.56 518 4578 linformer
SRA 4.72 419.56 710 4810 sra.py
Local Window 2.36 477.17 631 4537 shifted_window.py
Shifted Window 2.36 477.17 374 4351 shifted_window.py
Focal 2.44 526.85 146 2842 focal.py
XCA 2.36 481.69 583 4659 xca.py
QuadTree 5.33 613.25 72 3978 quadtree.py
VAN 1.83 357.96 59 4213 van.py
HorNet 2.23 436.51 132 3996 hornet.py
HiLo 2.20 298.30 1029 5104 hilo.py

Note: Each method has its own hyperparameters. For a fair comparison on 1/16 scale feature maps, all methods in the above table adopt their default 1/16 scale settings, as shown in their released code repo. For example, when dealing with 1/16 scale feature maps, HiLo in LITv2 adopt a window size of 2 and alpha of 0.9. Future works will consider more scales and memory benchmarking.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

You might also like...

[ICML'22] Benchmarking and Analyzing Point Cloud Robustness under Corruptions

[ICML'22] Benchmarking and Analyzing Point Cloud Robustness under Corruptions

Benchmarking and Analyzing Point Cloud Robustness under Corruptions Jiawei Ren, Lingdong Kong, Liang Pan, Ziwei Liu S-Lab, Nanyang Technological Unive

Nov 18, 2022

Multi-Modal Lidar Dataset for Benchmarking General-Purpose Localization and Mapping Algorithms

 Multi-Modal Lidar Dataset for Benchmarking General-Purpose Localization and Mapping Algorithms

Multi-Modal Lidar Dataset for Benchmarking General-Purpose Localization and Mapping Algorithms (Left) Front view of the multi-modal data acquisition s

Nov 2, 2022

Benchmarking toolkit for patch-based histopathology image classification.

ChampKit ChampKit: Comprehensive Histopathology Assessment of Model Predictions toolKit. A benchmarking toolkit for patch-based image classification i

Nov 3, 2022

Source code for the ECCV 2022 paper "Benchmarking Localization and Mapping for Augmented Reality".

The LaMAR Benchmark for Localization and Mapping in Augmented Reality This repository hosts the source code for our upcoming ECCV 2022 paper: LaMAR: B

Nov 24, 2022

Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22

Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22

Panoptic Scene Graph Generation Panoptic Scene Graph Generation Jingkang Yang, Yi Zhe Ang, Zujin Guo, Kaiyang Zhou, Wayne Zhang, Ziwei Liu S-Lab, Nany

Nov 25, 2022

A tool for benchmarking image generation models.

🖼️ Dream Bench 📊 What does it do? dream_bench provides a simplified interface for benchmarking your image-generation models. This repository also ho

Nov 25, 2022

Benchmarking end-to-end SAT solvers.

SAT Benchmark Introduction SAT Benchmark (satb) is a PyTorch implementation of a collection of published end-to-end SAT solvers with the ability to re

Nov 20, 2022

A simple benchmarking SEO tool written in Python.

A simple benchmarking SEO tool written in Python.

SEO-Project A simple benchmarking SEO tool written in Python. The program extracts and compares the values of the sites you enter for you (the data is

Nov 18, 2022

Benchmarking for dot-accessible dict packages in python

dotdict-bench Benchmarking for dot-accessible dict packages in python More test ideas? Submit an issue! Package Information As of 2022-09-21 23:11:19.

Sep 22, 2022
Owner
Zizheng Pan
PhD candidate at Monash University, ZIP Lab.
Zizheng Pan
code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

PyTorch implementation of UAGAN(U-net Attention Generative Adversarial Networks) This repository contains the source code for the paper "A High-precis

Tong 8 Apr 25, 2022
[IJCV - Accepted] AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation

AOE-Net Source code of paper: "AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation", wh

null 6 Nov 23, 2022
Pytorch code for Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency

PyTorch Implementation of Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency. Viraj Prabhu*, Sriram Yen

Viraj Prabhu 12 Nov 22, 2022
Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process with disentangled search and retrieval head aggregation, in Pytorch

Compositional Attention - Pytorch Implementation of Compositional Attention from MILA. They reframe the "heads" of multi-head attention as "searches",

Phil Wang 45 Nov 28, 2022
Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"

Deformable Attention Implementation of Deformable Attention from this paper in Pytorch, which appears to be an improvement to what was proposed in DET

Phil Wang 123 Nov 23, 2022
This project illustrates an Evil Twin Attack and also offers a defense mechanism for the attack.

Evil Twin Attack An evil twin attack takes place when an attacker sets up a fake Wi-Fi access point hoping that users will connect to it instead of a

Reut Hadad 1 Sep 3, 2022
Codes for "Facial representation comparisons between human brain and hierarchical deep convolutional neural network reveal a fatigue repetition suppression mechanism"

Step-by-step implementations: A) EEG part - Facial representations and repetition suppression in human brains: Step 1: EEG classification-based decodi

Zitong Lu 2 Sep 13, 2022
Anomaly detection framework using multiple detection techniques and voting mechanism.

Real-time IoT robust anomaly detection framework With rapid digitalization of the economy, we experience exponential growth of real time data being pr

Jakub Augustin 3 Oct 12, 2022
Python JSON benchmarking and correectness.

json_benchmark This repository contains benchmarks for Python JSON readers & writers. What's the fastest Python JSON parser? Let's find out. To run th

Tyler Kennedy 6 Nov 26, 2022
Home of the PaRoutes framework for benchmarking multi-step retrosynthesis predictions.

PaRoutes is a framework for benchmarking multi-step retrosynthesis methods, i.e. route predictions. It provides: A curated reaction dataset for buildi

AstraZeneca - Molecular AI 45 Nov 25, 2022