A tool to quantify transposable elements expression in scRNA-seq

Overview

GitHub Workflow Status PyPI install with bioconda

IRescue

IRescue is a software for quantifying the expression of transposable elements (TEs) subfamilies in single cell RNA sequencing (scRNA-seq) data. The core feature of IRescue is to consider all multiple alignments (i.e. non-primary alignments) of reads/UMIs mapping on multiple TEs in a BAM file, to better infer the TE subfamily of origin. IRescue implements a UMI error-correction, deduplication and quantification strategy that includes such alignment events. IRescue's output is compatible with most scRNA-seq analysis toolkits, such as Seurat or Scanpy.

Content

Installation

Using conda (recommended)

We recommend using conda, as it will install all the required packages along IRescue.

conda create -n irescue -c conda-forge -c bioconda irescue

Using pip

If for any reason it's not possible or desiderable to use conda, it can be installed with pip and the following requirements must be installed manually: python>=3.7, samtools>=1.12 and bedtools>=2.30.0.

pip install irescue

Usage

Quick start

The only required input is a BAM file annotated with cell barcode and UMI sequences as tags (by default, CB tag for cell barcode and UR tag for UMI; override with --CBtag and --UMItag). You can obtain it by aligning your reads using STARsolo.

RepeatMasker annotation will be automatically downloaded for the chosen genome assembly (e.g. -g hg38), or provide your own annotation in bed format (e.g. -r TE.bed).

irescue -b genome_alignments.bam -g hg38

If you already obtained gene-level counts (using STARsolo, Cell Ranger, Alevin, Kallisto or other tools), it is advised to provide the whitelisted cell barcodes list as a text file, e.g.: -w barcodes.tsv. This will significantly improve performance.

IRescue performs best using at least 4 threads, e.g.: -p 8.

Output files

IRescue generates TE counts in a sparse matrix format, readable by Seurat or Scanpy:

IRescue_out/
├── barcodes.tsv.gz
├── features.tsv.gz
└── matrix.mtx.gz

Load IRescue data with Seurat

To integrate TE counts into an existing Seurat object containing gene expression data, they can be added as an additional assay:

# import TE counts from IRescue output directory
te.data <- Seurat::Read10X('./IRescue_out/', gene.column = 1, cell.column = 1)

# create Seurat assay from TE counts
te.assay <- Seurat::CreateAssayObject(te.data)

# subset the assay by the cells already present in the Seurat object (in case it has been filtered)
te.assay <- subset(te.assay, colnames(te.assay)[which(colnames(te.assay) %in% colnames(seurat_object))])

# add the assay in the Seurat object
seurat_object[['TE']] <- irescue.assay

The result will be something like this:

An object of class Seurat 
32276 features across 42513 samples within 2 assays 
Active assay: RNA (31078 features, 0 variable features)
 1 other assay present: TE
You might also like...

GeneFlow: Design of a tool for the analysis of gene expression data

GeneFlow is a scalable and flexible software library that encompasses data acquisition methods, preprocessing techniques, data analytics, data visualization, and Machine Learning algorithms that can be applied to genomic data.

Nov 3, 2022

Realtime micro-expression recognition using OpenCV and PyTorch

Realtime micro-expression recognition using OpenCV and PyTorch

Micro-expression Recognition Realtime micro-expression recognition from scratch using OpenCV and PyTorch Try it out with a webcam or video using the e

Dec 5, 2022

Facial expression recognition on FER+ dataset using CNN(VGG16, EfficientNet)

Facial expression recognition on FER+ dataset using CNN(VGG16, EfficientNet)

facial-expression-recognition Facial expression recognition on FER+ dataset using CNN(VGG16, EfficientNet-B0) Emotion: neutral, happiness, surprise, s

May 11, 2022

Confluence OGNL expression injected RCE(CVE-2022-26134) poc and exp

CVE-2022-26134 Confluence OGNL expression injected RCE(CVE-2022-26134) poc and exp Usage Edit the python script. if __name__ == '__main__': taget

Nov 2, 2022

Facial expression analysis - mood-based music

Facial expression analysis -> mood-based music

N(eural)N(etwork)ew Taste Soon to be fully deployed! Technologies Frontend: ReactJS TensorflowJS (facial landmarking) Backend: Flask OpenCV (image pro

Jun 21, 2022

Educational, animated regular expression engine

Educational, animated regular expression engine

Regex-Engine This is a regular expression engine that functions by converting a regular expression to a Nondeterministic Finite State Automata (NFA).

Oct 19, 2022

Official implementation for ECCV 2022 paper "CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition"

CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition Project structure ├── README.md ├── comer

Jan 1, 2023

Official Pytorch Implementation of SPECTRE: Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos

Official Pytorch Implementation of SPECTRE: Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos

SPECTRE: Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos Our method performs visual-speech aware 3D reconstruction so t

Jan 6, 2023

A generative model of 3D facial details that can perform expression, age and wrinkle line editing (ECCV 2022).

A generative model of 3D facial details that can perform expression, age and wrinkle line editing (ECCV 2022).

Structure-aware Editable Morphable Model for 3D Facial Detail Animation and Manipulation Code for our ECCV 2022 paper "Structure-aware Editable Morpha

Dec 28, 2022
Comments
  • ValueError: range() arg 3 must not be zero

    ValueError: range() arg 3 must not be zero

    Hi, Thanks for the nice job in bioRxiv, hope it will be successfully accepted in a good journal. Now I want to use it to quantificate my single-cell RNA-seq data. An error occurred when I ran the command nohup ~/anaconda3/envs/irescue/bin/irescue -b possorted_genome_bam.bam -p 8 -r /public1/home/sc60481/Axolotl/sc-RNA/03.deal.TE/All.TE.deal.bed -w ./filtered_feature_bc_matrix/barcodes.tsv.gz &. I am not sure what caused the error. Hope for your reply and help.

    Thanks for your time and work.

    图片

    bug 
    opened by xiangyupan 11
  • Confued number of clusters by TE matrix

    Confued number of clusters by TE matrix

    Hi beboli, It is still me. After I successfully ran the irescue and got the three files (matrix.mtx.gz,features.tsv.gz and barcodes.tsv.gz) of each time point. I ran the command to add the TE assay into the RNA assay. dpa0.data <- Read10X(data.dir = "/public1/home/sc60481/Axolotl/sc-RNA/dpa0/outs/filtered_feature_bc_matrix")
    dpa0 <- CreateSeuratObject(counts = dpa0.data, project = "dpa0", min.cells = 3, min.features = 100) dpa0.te.data <- Seurat::Read10X('./dpa0/outs/IRescue_out/', gene.column = 1, cell.column = 1)
    te.assay <- Seurat::CreateAssayObject(dpa0.te.data)
    te.assay <- subset(te.assay, colnames(te.assay)[which(colnames(te.assay) %in% colnames(dpa0))])
    dpa0[['TE']] <- te.assay

    As the scRNA-seq data has been analyzed and intergrated with annotations of celltype info before I ran irescue, I found that the TE assay of each stage can not be added to the previous seurat object. Then I re-ran each stage follow aforementioned commands and merged all my seven stages by Harmony and ran the normalization, scale and findcluster analysis based on this object. 图片 As the species I used has 48 subfamilies of TE, the the TE matrix is 48 subfamilies × N cell. 图片

    Am I right? I can not understand this TE matrix for why not the matrix is each TE × N cell. The second confusion of mine is when I ran FindClusters with resolution <1.0, I can only get 3 clusters, while resolution >1.0 (I have try 1.0001),the number of clusters increased to ~9000. I think I must make something errors. Hope you can help me. Thank you very much. Xiangyu

    opened by xiangyupan 3
Releases(v1.0.2)
Owner
Bodega Lab
Bodega Lab
A package to count reads mapping on transposable elements subfamilies, families and classes.

TEcount A package to count reads mapping on transposable elements (TEs) subfamilies, families and classes. Features TEcount counts high-throughput seq

Bodega Lab 1 Sep 11, 2022
Finding all the possible partitions of a list maintaining in the partition the order of the elements in the original list, and all the elements in the original list must be in some partition.

komby The Problem Finding all the possible partitions of a list maintaining in the partition the order of the elements in the original list, and all t

Anselmo Battisti 1 May 30, 2022
[NeurIPS 2022] The official repository of Expression Learning with Identity Matching for Facial Expression Recognition

ELIM_FER Optimal Transport-based Identity Matching for Identity-invariant Facial Expression Recognition (NeurIPS 2022) Daeha Kim, Byung Cheol Song CVI

Daeha Kim 17 Dec 15, 2022
SIMS: Scalable, Interpretable Models for Cell Annotation of large scale single-cell RNA-seq data

SIMS: Scalable, Interpretable Modeling for Single-Cell RNA-Seq Data Classification SIMS is a pipeline for building interpretable and accurate classifi

Julian Lehrer 3 Dec 21, 2022
Bulk RNA-seq analysis.

Bulk RNA-seq Latch Verified Produce transcript/count matrices from sequencing reads. Hosted Interface · SDK Documentation · Slack Community Workflow A

Latch Verified 30 Dec 19, 2022
The official source code for "Deep single-cell RNA-seq data clustering with graph prototypical contrastive learning"

Deep single-cell RNA-seq data clustering with graph prototypical contrastive learning The official source code for "Deep single-cell RNA-seq data clus

JunSeok 0 Dec 1, 2022
Extension for python-capella-mbse that adds automatically generated context diagrams for arbitrary model elements.

Context Diagram extension for capellambse This extension of py-capellambse enables generation of views (diagrams) that describe an element context (fr

Digitale Schiene Deutschland - Digitalisierung Bahnsystem 4 Oct 19, 2022
A cellular automaton that models the earth and the interactions between weather elements.

cellular-automaton-model-for-weather A cellular automaton that models the earth and the interactions between weather elements. Alongside a visual repr

Lihi 1 Sep 21, 2022
Google Maps Crawler takes Google Maps List and it scrape elements from all items such as: title, rating, reviews, location url, website,etc.

Google Maps Crawler Google Maps Crawler takes Google Maps List and it scrape elements from all items such as: title, rating, reviews, location url, we

Marko Vasiljevic 1 Sep 15, 2022
Dashboard Application built with Django displaying the status of deployable assets and their related elements.

Status Dashboard Dashboard application displaying the state of deployable server sets and their core elements. Database schema erDiagram DSS ||--|

David Milne 1 Oct 4, 2022