About Code release for Flowformer: Linearizing Transformers with Conservation Flows (ICML 2022)

Overview

Flowformer (ICML 2022)

Flowformer: Linearizing Transformers with Conservation Flows

Transformers have achieved impressive success in various areas. However, the attention mechanism has a quadratic complexity, significantly impeding Transformers from dealing with numerous tokens and scaling up to bigger models. In pursuing the linear complexity and task-universal foundation model, we propose Flowformer [paper] with the following merits:

  • Linear complexity w.r.t sequence length, can handle extermely long sequence (over 4k tokens)
  • Without specific inducitve bias, purely derived from the flow network theory
  • Task-universal, showing strong performance in $\color{red}{\text{Long sequence, Vision, NLP, Time series, RL}}$ .

Flow-Attention Design

We cast the attention mechanism into flow network, where the information flow is aggregated from the sources (values) to the sinks (results) through the learned flow capacities (attentions).

By conducting the conservation in both source and sink ascpects, we can bring competition into Flow-Attention design to avoid trivial attention in the spirit that "fixed resource will cause competition''.



Figure 1. Flow-Attention with Competition and Allocation mechanisms.

Get Started

  1. Please refer to different folders for detailed experiment instructions.

    Note: We have suffered a lot in configuring environments for different tasks. If you also have problems in solving the environment, feel free to contact us and discuss about it.

  2. List of benchmarks

  • Core code: see Flow_Attention.py
  • GPT-style Pytorch Module: see Flowformer_TorchModule
  • Long Sequence Modeling in LRA: see Flowformer_LRA
  • Vision Recognization in ImageNet-1K: see Flowformer_CV
  • Language Modeling in WikiText-103: see Flowformer_NLP
  • Time series classification in UEA: see Flowformer_TimeSeries
  • Reinforcement Learning in D4RL: see Flowformer_RL
  • CUDA speed up version

Main Results

See the [paper] for detailed results, including nearly 20 comparing baselines.

Task Metrics Flowformer Performer Reformer Vanilla
Transformer
Long Sequence Modeling
(LRA)
Avg Acc (%) $\uparrow$ 56.48 51.41 50.67 OOM
Vision Recognization
(ImageNet-1K)
Top-1 Acc (%) $\uparrow$ 80.6 78.1 79.6 78.7
Language Modeling
(WikiText-103)
Perplexity $\downarrow$ 30.8 37.5 33.6 33.0
Time series classification
(UEA)
Avg Acc (%) $\uparrow$ 73.0 71.5 71.9 71.9
Offline RL
(D4RL)
Avg Reward $\uparrow$
Avg Deviation $\downarrow$
73.5 $\pm$ 2.9 63.8 $\pm$ 7.6 63.9 $\pm$ 2.9 72.2 $\pm$ 2.6

Vanilla Transformer means Decision Transorfomer in RL.

Attention Visualization



Figure 2. Attention visualization. Flowformer can capture the essential parts successfully.

Citation

If you find this repo useful, please cite our paper.

@inproceedings{wu2022flowformer,
  title={Flowformer: Linearizing Transformers with Conservation Flows},
  author={Haixu Wu and Jialong Wu and Jiehui Xu and Jianmin Wang and Mingsheng Long},
  booktitle={International Conference on Machine Learning},
  year={2022}
}

Contact

If you have any questions or want to use the code, please contact [email protected].

You might also like...

[ICML 2022] ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks

ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks Haoran You, Baopu Li, Huihong Shi, Yonggan Fu, Yingyan Lin Accep

Aug 1, 2022

[ICML 2022] NeuroFluid: Fluid Dynamics Grounding with Particle-Driven Neural Radiance Fields

NeuroFluid Code reposity for this paper: NeuroFluid: Fluid Dynamics Grounding with Particle-Driven Neural Radiance Fields. Shanyan Guan, Huayu Deng, Y

Aug 19, 2022

Library for the training and evaluation of object-centric models (ICML 2022)

Library for the training and evaluation of object-centric models (ICML 2022)

Object-centric library Code accompanying our paper: Generalization and Robustness Implications in Object-Centric Learning Andrea Dittadi, Samuele Papa

Sep 7, 2022

Official implementation of the ICML 2022 paper "Going Deeper into Permutation-Sensitive Graph Neural Networks"

Official implementation of the ICML 2022 paper

Permutation Group Based Graph Neural Networks (PG-GNN) The official implementation of Going Deeper into Permutation-Sensitive Graph Neural Networks (I

Sep 11, 2022

Training-free data valuation on deep neural network applications. (ICML-2022)

DAVINZ: Data Valuation using Deep Neural Networks at Initialization [ICML-2022] This repository is the official implementation of the following paper

Jul 18, 2022

This is the official implementation for Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models (Accepted in ICML 2022)

Official implementation for Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models (ICML 2022), and a reimplementation of Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models (ICLR 2022)

Sep 20, 2022

To appear in the 39th International Conference on Machine Learning (ICML 2022).

State Transition of Dendritic Spines Improves Learning of Sparse Spiking Neural Networks This repo contains the code reproducing the results of STDS (

Jul 25, 2022

A Pytorch implementation of ICML 2022 paper "NP-Match: When Neural Processes meet Semi-Supervised Learning"

NP-Match NP-Match: When Neural Processes meet Semi-Supervised Learning Jianfeng Wang1, Thomas Lukasiewicz1, Daniela Massiceti2, Xiaolin Hu3, Vladimir

Sep 22, 2022

Meaningfully debugging model mistakes with conceptual counterfactual explanations. ICML 2022

Meaningfully debugging model mistakes with conceptual counterfactual explanations. ICML 2022

Meaningfully debugging model mistakes with conceptual counterfactual explanations What is this work about? Understanding model mistakes is critical to

Sep 12, 2022
Comments
  • Flowformer_NLP/flow_attention.py 在使用交叉注意力方式计算,且q 和kv的长度不同时会报错

    Flowformer_NLP/flow_attention.py 在使用交叉注意力方式计算,且q 和kv的长度不同时会报错

    Flowformer_NLP/flow_attention.py 在使用交叉注意力方式计算,且q 和kv的长度不同时会报错

    (1) incoming and outgoing flow

            sink_incoming = 1.0 / (torch.einsum("nld,nld->nl", q + 1e-6, k.cumsum(dim=1) + 1e-6))
    

    image

    opened by wanpengxyzz 5
  • add web demo/models/datasets to ICML 2022 organization on Hugging Face

    add web demo/models/datasets to ICML 2022 organization on Hugging Face

    Hi, congrats for the acceptance at ICML 2022. We are having an event on Hugging Face for ICML 2022, where you can submit spaces(web demos), models, and datasets for papers for a chance to win prizes. The hub offers free hosting and would make your work more accessible to the rest of the community. Hugging Hub works similar to github where you can push to user profiles or organization accounts, you can add the models/datasets and spaces to this organization: https://huggingface.co/ICML2022, after joining the organization using this link https://huggingface.co/organizations/ICML2022/share/BpynfJtfsOTktlmXYoKNqqCnyufKLFXuay, let me know if you need any help with the above steps, thanks.

    opened by AK391 2
  • 如何在Flowformer中显式地添加attn_mask?

    如何在Flowformer中显式地添加attn_mask?

    作者您好,我对你们在 ICML 2022 的工作十分感兴趣,但在使用过程中有点疑问想请教一下,希望能够给我一些建议,谢谢。

    1. 在代码中的 Flow_Attention_Causal 函数,casual 是通过 cumsum 函数实现的,但是在以往的注意力方法中都是通过 attn_mask 来进行显式地 mask,但使用乘法结合律的话需要先计算 KV,导致 attn_mask 的维度和 KV 维度不匹配,无法进行正常的 mask ,请问有没有什么方法依然可以显式地使用 attn_mask 进行 casual 操作?

    谢谢你们精彩的工作,期待您的回答。

    opened by Prot-debug 1
Owner
THUML @ Tsinghua University
Machine Learning Group, School of Software, Tsinghua University
THUML @ Tsinghua University
Code for ICML 2022 paper — Efficient Test-Time Model Adaptation without Forgetting

?? Efficient Test-Time Model Adaptation without Forgetting This is the official project repository for Efficient Test-Time Model Adaptation without Fo

null 34 Sep 21, 2022
Code for "Constrained Variational Policy Optimization for Safe Reinforcement Learning" (ICML 2022))

Constrained Variational Policy Optimization for Safe Reinforcement Learning This project provides the open source implementation of the CVPO method in

Zuxin 25 Sep 12, 2022
Source code for "A Closer Look at Smoothness in Domain Adversarial Training", ICML 2022

Smooth Domain Adversarial Training Harsh Rangwani*, Sumukh K Aithal*, Mayank Mishra, Arihant Jain, R. Venkatesh Babu This is the official PyTorch impl

Video Analytics Lab -- IISc 24 Sep 16, 2022
Official code for ICML 2022: Mitigating Neural Network Overconfidence with Logit Normalization

Mitigating Neural Network Overconfidence with Logit Normalization ICML 2022: This repository is the official implementation of LogitNorm. Requirements

Hongxin Wei 74 Sep 20, 2022
Code for A General Recipe for Likelihood-free Bayesian Optimization, ICML 2022

A General Recipe for Likelihood-free Bayesian Optimization A General Recipe for Likelihood-free Bayesian Optimization. Jiaming Song*1, Lantao Yu*2, Wi

null 32 Sep 13, 2022
Code for Winning the Lottery Ahead of Time: Efficient Early Network Pruning (ICML 2022)

Compression via Gradient Flow Preservation This repository is the official implementation of Winning the Lottery Ahead of Time: Efficient Early Networ

John Rachwan 10 Aug 11, 2022
PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an implementation of kNN-LM and kNN-MT

kNN-transformers: Nearest-Neighbor Language and Machine Translation Models based on Hugging Face's ?? transformers library This is a Hugging Face's ??

NeuLab 138 Sep 15, 2022
Code for Improving Task-free Continual Learning by Distributionally Robust Memory Evolution (ICML 2022)

Improving Task-free Continual Learning by Distributionally Robust Memory Evolution (ICML 2022) Package Requirements Python 3.8 Pytorch 1.8.1 Note : Ou

null 7 Sep 14, 2022
A curated list of the latest breakthroughs in AI (in 2022) by release date with a clear video explanation, link to a more in-depth article, and code.

2022: A Year Full of Amazing AI papers- A Review [WORK IN PROGRESS] ?? A curated list of the latest breakthroughs in AI by release date with a clear v

Louis-François Bouchard 619 Sep 22, 2022
Official code release for Monocular 3D Object Reconstruction with GAN Inversion (ECCV 2022)

Monocular 3D Object Reconstruction with GAN Inversion (ECCV 2022) This paper presents a novel GAN Inversion framework for single view 3D object recons

null 24 Sep 7, 2022