IRescue is a software for quantifying the expression of transposable elements (TEs) subfamilies in single cell RNA sequencing (scRNA-seq) data. The core feature of IRescue is to consider all multiple alignments (i.e. non-primary alignments) of reads/UMIs mapping on multiple TEs in a BAM file, to better infer the TE subfamily of origin. IRescue implements a UMI error-correction, deduplication and quantification strategy that includes such alignment events. IRescue's output is compatible with most scRNA-seq analysis toolkits, such as Seurat or Scanpy.
We recommend using conda, as it will install all the required packages along IRescue.
conda create -n irescue -c conda-forge -c bioconda irescue
If for any reason it's not possible or desiderable to use conda, it can be installed with pip and the following requirements must be installed manually:
pip install irescue
The only required input is a BAM file annotated with cell barcode and UMI sequences as tags (by default,
CB tag for cell barcode and
UR tag for UMI; override with
--UMItag). You can obtain it by aligning your reads using STARsolo.
RepeatMasker annotation will be automatically downloaded for the chosen genome assembly (e.g.
-g hg38), or provide your own annotation in bed format (e.g.
irescue -b genome_alignments.bam -g hg38
If you already obtained gene-level counts (using STARsolo, Cell Ranger, Alevin, Kallisto or other tools), it is advised to provide the whitelisted cell barcodes list as a text file, e.g.:
-w barcodes.tsv. This will significantly improve performance.
IRescue performs best using at least 4 threads, e.g.:
IRescue_out/ ├── barcodes.tsv.gz ├── features.tsv.gz └── matrix.mtx.gz
To integrate TE counts into an existing Seurat object containing gene expression data, they can be added as an additional assay:
# import TE counts from IRescue output directory te.data <- Seurat::Read10X('./IRescue_out/', gene.column = 1, cell.column = 1) # create Seurat assay from TE counts te.assay <- Seurat::CreateAssayObject(te.data) # subset the assay by the cells already present in the Seurat object (in case it has been filtered) te.assay <- subset(te.assay, colnames(te.assay)[which(colnames(te.assay) %in% colnames(seurat_object))]) # add the assay in the Seurat object seurat_object[['TE']] <- irescue.assay
The result will be something like this:
An object of class Seurat 32276 features across 42513 samples within 2 assays Active assay: RNA (31078 features, 0 variable features) 1 other assay present: TE