Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

Overview

MultiMAE: Multi-modal Multi-task Masked Autoencoders

Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir

Website | arXiv | BibTeX

Open in Colab Hugging Face Spaces

Official PyTorch implementation and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders.

We introduce Multi-modal Multi-task Masked Autoencoders (MultiMAE), an efficient and effective pre-training strategy for Vision Transformers. Given a small random sample of visible patches from multiple modalities, the MultiMAE pre-training objective is to reconstruct the masked-out regions. Once pre-trained, a single MultiMAE encoder can then be used for both single-modal and multi-modal downstream transfer, yielding competitive to or significantly better results than the baselines.

Catalog

  • Pre-trained models
  • MultiMAE pre-training code
  • ImageNet-1K classification fine-tuning code
  • Semantic segmentation fine-tuning code (single-modal & multi-modal)
  • Depth estimation fine-tuning code
  • Taskonomy fine-tuning code
  • Colab & Hugging Face demos

Pre-trained models

We provide the weights of our pre-trained MultiMAE ViT-B model, in MultiViT (multi-modal) format and timm (RGB-only) format.

For comparison, we also provide the weights of a MAE ViT-B model that we pre-trained using the official MAE codebase following the recommended settings.

Method Arch. Pre-training
modalities
Pre-training
epochs
Weights
(MultiViT)
Weights
(timm)
Config
MAE ViT-B RGB 1600 download download See MAE
MultiMAE ViT-B RGB+D+S 1600 download download link

These pre-trained models can then be fine-tuned using this codebase to reach the following performance:

Method Classif. (@1) Semantic Segmentation (mIoU) Depth (δ1)
ImageNet-1K
(RGB)
ADE20K
(RGB)
Hypersim
(RGB / D / RGB + D)
NYUv2
(RGB / D / RGB + D)
NYUv2
(RGB)
Sup. (DeiT) 81.8 45.8 33.9 - - 50.1 - - 80.7
MAE 83.3 46.2 36.5 - -
50.8 - - 85.1
MultiMAE 83.3 46.2 37.0 38.5 47.6 52.0 41.4 56.0 86.4

Model formats

We provide pre-trained weights in two different formats: the single-modal ViT / timm format, which is compatible with other popular ViT repositories (e.g., timm, DINO, MAE), and the multi-modal MultiMAE / MultiViT format, which is used throughout this codebase for multi-modal pre-training and fine-tuning. See multimae/multimae.py for the documentation and implementation of MultiMAE / MultiViT.

You can convert between these formats using the provided vit2multimae_converter.py and multimae2vit_converter.py scripts.

Usage

Set-up

See SETUP.md for set-up instructions.

Pre-training

See PRETRAINING.md for pre-training instructions.

Fine-tuning

See FINETUNING.md for fine-tuning instructions.

Demo & visualizations

For interactive demos, please see our website. Open our Colab notebook to play around with the visualization code, or simply upload an image to our Hugging Face Spaces demo.

Acknowledgement

This repository is built using the timm, DeiT, DINO, MoCo v3, BEiT, MAE-priv, and MAE repositories.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Citation

If you find this repository helpful, please consider citing our work:

@article{bachmann2022multimae,
  author    = {Roman Bachmann and David Mizrahi and Andrei Atanov and Amir Zamir},
  title     = {{MultiMAE}: Multi-modal Multi-task Masked Autoencoders},
  journal   = {arXiv preprint arXiv:2204.01678},
  year      = {2022},
}
Comments
  • ADE20K dataset structure for semantic segmentation

    ADE20K dataset structure for semantic segmentation

    Hi,

    First of all, thanks for your amazing work!

    We're trying to reproduce the paper results and stumbled over how to set up semantic segmentation finetuning with ADE20K. To our surprise, the data loader seems to expect the same root/task_a/class_x/xxx.ext folder hierarchy known from classification. However, as the images naturally contain more than a single semantic class, we're not sure how the images are supposed to be arranged.

    Could you hence give us a hint on how the data should be structured to work with the provided ft_ade_64e_multimae-b_rgb.yaml configuration?

    Thank you, Paul

    opened by pkwagner 6
  •  The reason why depth map should be divided 2**16 ?

    The reason why depth map should be divided 2**16 ?

    Thank you for your great Multi-MAE, We observed that in https://github.com/EPFL-VILAB/MultiMAE/blob/main/utils/datasets.py line 96, you use img = torch.Tensor(np.array(task_dict[task]) / 2 ** 16). Can you tell me the reason why depth map should be divided 2**16 ? Is there any problems without this operation?

    opened by yingfei1016 6
  • add web demo/model to Huggingface

    add web demo/model to Huggingface

    Hi, would you be interested in adding MultiMAE to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. Models/datasets/spaces(web demos) can be added to a user account or organization similar to github.

    Example from other organizations: Keras: https://huggingface.co/keras-io Microsoft: https://huggingface.co/microsoft Facebook: https://huggingface.co/facebook

    Example spaces with repos: github: https://github.com/salesforce/BLIP Spaces: https://huggingface.co/spaces/salesforce/BLIP

    github: https://github.com/facebookresearch/omnivore Spaces: https://huggingface.co/spaces/akhaliq/omnivore

    and here are guides for adding spaces/models/datasets to your org

    How to add a Space: https://huggingface.co/blog/gradio-spaces how to add models: https://huggingface.co/docs/hub/adding-a-model uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

    Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

    opened by AK391 4
  • Linear probing results

    Linear probing results

    Hey, Thank you for providing the code for the paper. The paper is really interesting and the project page is very well done!

    I was wondering whether you've tested the performance of linear probing on the RGB image when trained with all 3 modalities. The results of the original MAE paper were not very good, it is interesting to understand if the additional supervision creates better representations that translate into better linear probing scores.

    Thanks, Eliahu

    opened by eliahuhorwitz 4
  • Query about semseg domain in pre-training

    Query about semseg domain in pre-training

    Hi, I have successful made the pesudo labels and trained ‘rgb’ in/out-domain multimae model.

    But when I trained model with 'rgb-semseg' in/out-domain, I met an error in multimae/input_adapters.py line 232

    # Create patches [B, C, H, W] -> [B, (H*W), C]
    x_patch = rearrange(self.proj(x), 'b d nh nw -> b (nh nw) d')
    

    The full log is log.txt. x.size() is [batchsize, 64, 56, 56] before line 232. I can't find out what's wrong.

    What's more, I don't know why the pseudo semeg label image resize into 1/4 (that is 224*224->56*56)in utils/datasets.py line 105

    # Convert to Tensor
    for task in task_dict:
        if task in ['depth']:
            img = torch.Tensor(np.array(task_dict[task]) / 2 ** 16)
            img = img.unsqueeze(0)  # 1 x H x W
        elif task in ['rgb']:
            img = TF.to_tensor(task_dict[task])
            img = TF.normalize(img, mean=self.rgb_mean, std=self.rgb_std)
        elif task in ['semseg', 'semseg_coco']:
            # TODO: add this to a config instead
            # Rescale to 0.25x size (stride 4)
            scale_factor = 0.25
            img = task_dict[task].resize((int(self.input_size * scale_factor), int(self.input_size * scale_factor)))
            # Using pil_to_tensor keeps it in uint8, to_tensor converts it to float (rescaled to [0, 1])
            img = TF.pil_to_tensor(img).to(torch.long).squeeze(0)
    

    and then use nn.Conv2d in multimae/input_adapters.py line 198

    if self.interpolate_class_emb:
        self.proj = nn.Sequential(
            nn.Upsample(scale_factor=(1 / self.P_H, 1 / self.P_W),
                        mode='bilinear'),  # Actually a downsample operation
            nn.Conv2d(in_channels=self.dim_class_emb, out_channels=self.dim_tokens,
                        kernel_size=1, stride=1),
        )
    else:
        self.proj = nn.Conv2d(
            in_channels=self.dim_class_emb, out_channels=self.dim_tokens,
            kernel_size=(self.P_H, self.P_W), stride=(self.P_H, self.P_W)
        )
    )
    

    Thank you for any help.

    opened by Chianghui-Wong 3
  • Problem during evaluate the pretrained model

    Problem during evaluate the pretrained model

    Problem

    Hi! I encountered some problem while just try to evaluate this model with same config as Demo on Colab.

    Environment

    Ubuntu 22.04 CUDA Kernel 10.1 CUDA Runtime 11.3 Pytorch 1.12.0

    Terminal

    /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [0,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [1,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [2,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [3,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [4,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [5,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [6,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [7,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [8,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [9,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [10,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [11,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [12,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [13,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [14,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [15,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [16,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [17,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [18,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [19,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [20,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [21,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [22,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [23,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [24,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [25,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [26,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [27,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [28,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [29,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [30,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [31,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. Traceback (most recent call last): File "/home/jxr/anaconda3/envs/python/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/jxr/anaconda3/envs/python/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module> cli.main() File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main run() File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file runpy.run_path(target, run_name="__main__") File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path return _run_module_code(code, init_globals, run_name, File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code exec(code, run_globals) File "/home/jxr/3D-MultiMAE/MultiMAE/try_model.py", line 118, in <module> preds, masks = multimae.forward( File "/home/jxr/3D-MultiMAE/MultiMAE/multimae/multimae.py", line 350, in forward encoder_tokens = self.encoder(input_tokens) File "/home/jxr/anaconda3/envs/python/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/jxr/anaconda3/envs/python/lib/python3.10/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/jxr/anaconda3/envs/python/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/jxr/3D-MultiMAE/MultiMAE/multimae/multimae_utils.py", line 230, in forward x = x + self.drop_path(self.attn(self.norm1(x))) File "/home/jxr/anaconda3/envs/python/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/jxr/3D-MultiMAE/MultiMAE/multimae/multimae_utils.py", line 175, in forward attn = (q @ k.transpose(-2, -1)) * self.scale RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

    opened by kamzero 2
  • Example usage of regular MAE Weights

    Example usage of regular MAE Weights

    Hey awesome work! I am trying to figure out how to modify the demo notebook to use the regular MAE instead of multiMAE. In particular i comment out all depth and semseg info but the resulting image infilling looks corrupted. Could you by chance share an example of proper usage of the regular MAE weights? Thanks so much for the help!

    opened by mhamilton723 2
  • Some doubts about pseudo labels

    Some doubts about pseudo labels

    Hi, I am pseudo-tagging the imagenet-1k, and encountering some difficulties.

    Firstly, I wonder what would happen if the classes of semeg are more than 255? How to use one channel depth png image to represent them? (Although COCO datasets is only 80 classes, the imagenet is more than 255 classes when fine-tuning)

    Secondly, on the example of Colab notebook, the rgb2depth model of DPT could not input any size of imagenet pictures. How could we save all the pseudo labels down before the data augmentation cutting it into 224*224? We need to align the original images with the pseudo labeled image should we?

    Thank you for any help.

    opened by Chianghui-Wong 2
  • is it normal to see this during finetuning?

    is it normal to see this during finetuning?

    _IncompatibleKeys(missing_keys=['output_adapters.cls.norm.weight', 'output_adapters.cls.norm.bias', 'output_adapters.cls.head.weight', 'output_adapters.cls.head.bias'], unexpected_keys=[])
    
    

    Could be happening because of deleting the output adapter. image

    @dmizr thank you for replying to previous issues.

    opened by forkbabu 1
  • how to evaluate/test mae or multimae on test dataset?

    how to evaluate/test mae or multimae on test dataset?

    @dmizr @amir32002 @roman-bachmann thanks for the paper

    is there a way in the code already to perform testing on a different dataset or is it required to code it up ourselves?

    opened by forkbabu 1
  • colab error

    colab error

    Hello,

    Thank you for the code. However, I am not able to run this: !wget https://drive.switch.ch/index.php/s/RFfTZwyKROKKx0l/download in the google colab. Can you please recheck?

    Thanks

    opened by AnukritiSinghh 1
Owner
Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL)
VILAB
Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL)
Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds

Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds Repository for our arxiv paper "Voxel-MAE: Masked Autoencoders for Pre-traini

minchen 133 Dec 26, 2022
Task Compass: Scaling Multi-task Pre-training with Task Prefix (EMNLP 2022: Findings) (stay tuned & more will be updated)

Task Compass: Scaling Multi-task Pre-training with Task Prefix This repository contains the source code for the EMNLP 2022 (Findings) paper: Task Comp

Zhuosheng Zhang 14 Nov 11, 2022
Code for the paper "Masked Autoencoders for Self-Supervised Learning on Automotive Point Clouds"

Voxel-MAE This is the official implementation of paper: Masked Autoencoders for Self-Supervised Learning on Automotive Point Clouds The code provided

Georg Hess 24 Jan 3, 2023
ECCV2022,Bootstrapped Masked Autoencoders for Vision BERT Pretraining

BootMAE, ECCV2022 This repo is the official implementation of "Bootstrapped Masked Autoencoders for Vision BERT Pretraining". Introduction We propose

DLight 83 Dec 20, 2022
Multimodal Masked Autoencoders (M3AE): A JAX/Flax Implementation

Multimodal Masked Autoencoders (M3AE): A JAX/Flax Implementation This is a JAX/Flax re-implementation for the paper Multimodal Masked Autoencoders Lea

Young Geng 36 Nov 30, 2022
MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis

MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis, ECCV 2022 This is the PyTorch implementation of our MeshMAE. Requirements python 3.9+ CUDA 11.

null 14 Dec 22, 2022
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning This is the official PyTorch implementation of our C

null 43 Jan 6, 2023
Unified Multi-modal Multi-task Joint Learning for Language-vision Relation Inference

UMMJL Unified Multi-modal Multi-task Joint Learning for Language-vision Relation Inference By Wenjie Lu and Dong Zhang. Environment python=3.7 torch=1

null 2 Aug 29, 2022
GENA-LM is a transformer masked language model trained on human DNA sequence.

GENA-LM GENA-LM is a transformer masked language model trained on human DNA sequence. Differences between GENA-LM and DNABERT: BPE tokenization instea

AIRI - Artificial Intelligence Research Institute 55 Dec 26, 2022
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3  Ziwei Liu1* 1S-Lab, Nanyang Technologic

Fangzhou Hong 96 Jan 3, 2023
Code for the NeurIPS 2022 paper "Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models"

Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models Official PyTorch implementation of our NeurIPS 2022 paper G

Chen Wu (吴尘) 86 Dec 25, 2022
Masked Visual Pre-training for Motor Control

Masked Visual Pre-training for Motor Control This is a PyTorch implementation of the paper Masked Visual Pre-training for Motor Control. It contains t

Ilija Radosavovic 112 Dec 27, 2022
On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency (ACL 2022)

This repository contains codes for the following paper: Seo Yeon Park, Cornelia Caragea : On the Calibration of Pre-trained Language Models using Mixu

Seoyeon Park 3 Nov 7, 2022
PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Feature_CRF_AE Feature_CRF_AE provides a implementation of Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

Jacob Zhou 6 Apr 29, 2022
Do Pre-trained Models Benefit Knowledge Graph Completion? A Reliable Evaluation and a Reasonable Approach

PKGC Source codes and datasets for ACL 2022 Findings paper Do Pre-trained Models Benefit Knowledge Graph Completion? A Reliable Evaluation and a Reaso

THU-KEG 20 Dec 11, 2022
PyTorch – SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models.

SMART - PyTorch A PyTorch implementation of SMART, a regularization technique to fine-tune pretrained (language) models. You might also be interested

archinet 41 Jan 4, 2023
P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting

P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting Created by Ziyi Wang*, Xumin Yu*, Yongming Rao*, Jie Zhou,

Ziyi Wang 81 Dec 29, 2022
Using pre-trained Diffusion models as priors for inference tasks

Diffusion Models as Plug-and-Play Priors Code for Diffusion Models as Plug-and-Play Priors. MNIST The code in the mnist directory is sufficient to rep

Alexandros Graikos 77 Dec 29, 2022
Uncertainty Quantification with Pre-trained Language Models: An Empirical Analysis

UQ-PLM Code for Uncertainty Quantification with Pre-trained Language Models: An Empirical Analysis (EMNLP 2022 Findings). Requirements PyTorch = 1.10.

null 4 Nov 10, 2022