Accelerated, Python-only, single-cell integration benchmarking metrics

Overview

scib-metrics

Tests Documentation

Accelerated and Python-only metrics for benchmarking single-cell integration outputs.

This package contains implementations of metrics for evaluating the performance of single-cell omics data integration methods. The implementations of these metrics use jax when possible for jit-compilation and hardware acceleration. All implementations are in Python.

Currently we are porting metrics used in the scIB manuscript (and code). Deviations from the original implementations are documented. However, metric values from this repository should not be compared to the scIB repository.

Getting started

Please refer to the documentation.

Installation

You need to have Python 3.8 or newer installed on your system. If you don't have Python installed, we recommend installing Miniconda.

There are several alternative options to install scib-metrics:

  1. Install the latest release on PyPI:
pip install scib-metrics
  1. Install the latest development version:
pip install git+https://github.com/yoseflab/[email protected]

Release notes

See the changelog.

Contact

For questions and help requests, you can reach out in the scverse discourse. If you found a bug, please use the issue tracker.

Citation

t.b.a

Comments
  • Memory issue for ASW scores

    Memory issue for ASW scores

    Hi,

    thanks for the cool reimplementation if the scib metrics! I've tried out the ASW metrics (label and batch) on a 700MB anndata object but get out-of-memory errors even at 10GB. Is there potentially a memory leak?

    Best, Michaela

    opened by mumichae 4
  • Lax'd/mapped silhouette and squareform pdist

    Lax'd/mapped silhouette and squareform pdist

    We can ironically reduce complexity by making each cell type the same shape (cells by features). To do so we pad zeros so each cell type has the number of cells of the most frequent cell type. Then we mask sums and means to ignore the padded zeros when necessary.

    Compile time is way down now, on the gaming computer it's about 1/5 of the previous time.

    opened by adamgayoso 2
  • Implement pc regression with jax

    Implement pc regression with jax

    Original source:

    https://github.com/theislab/scib/blob/main/scib/metrics/pcr.py

    See the manuscript for a description (search principal component regression):

    https://www.nature.com/articles/s41592-021-01336-8#Sec11

    new metric 
    opened by adamgayoso 2
  • Reimplement silhouette in a mem constant way, pdist using lax scan

    Reimplement silhouette in a mem constant way, pdist using lax scan

    Fixes #36

    Our implementation now exactly follows the algorithm used in sklearn (instead of computing full X pairwise distance matrix, compute it in vertical chunks cdist(X_chunk_size, X) and then aggregate intra and inter clust dists

    opened by adamgayoso 1
  • leiden nmi ari

    leiden nmi ari

    Adds leiden nmi ari scores to more closely match scib (they use louvain). This does a search of 10 resolutions of leiden clustering to pick the optimal NMI (as in scIB, but it uses 20 res params)

    opened by adamgayoso 1
  • Improve k-means++ init efficiency

    Improve k-means++ init efficiency

    Calculating the full pairwise distance matrix is unnecessary here. We can instead compute distances to candidates per cluster iteration.

    https://github.com/getkeops/keops/issues/166

    opened by martinkim0 0
  • Large memory issue for ARI/NMI with kmeans

    Large memory issue for ARI/NMI with kmeans

    Similar to #36 : Working with version 0.0.6 on the same dataset as in #36, I am still getting huge memory consumption (exceeding 90GB). The dataset isn't particularly huge (2.8GB dense, ~500MB sparse).

    opened by mumichae 5
  • Current pipeline

    Current pipeline

    This is just some code to use to apply all the metrics currently implemented on real data

    def compute_scib_metrics(adata, emb_key, label_key, batch_key, model_name):
        from scib_metrics import silhouette_batch, silhouette_label, isolated_labels, nmi_ari_cluster_labels_kmeans, clisi_knn, ilisi_knn, nmi_ari_cluster_labels_leiden, pcr_comparison
        import pandas as pd
    
        emb_key_ = "X_emb"
        adata.obsm[emb_key_] = adata.obsm[emb_key]
        X = adata.obsm[emb_key_]
        sc.tl.pca(adata)
        X_pre = adata.obsm["X_pca"]
        labels = np.array(adata.obs[label_key].astype("category").cat.codes).ravel()
        batch = np.array(adata.obs[batch_key].astype("category").cat.codes).ravel()
        sc.pp.neighbors(adata, use_rep=emb_key_)
        graph_conn = adata.obsp["connectivities"]
        df = pd.DataFrame(index=[model_name])
        df["nmi_kmeans"], df["ari_kmeans"] = nmi_ari_cluster_labels_kmeans(X, labels)
        df["nmi_leiden"], df["ari_leiden"] = nmi_ari_cluster_labels_leiden(graph_conn, labels, n_jobs=8)
        df["sil_batch"] = silhouette_batch(X, labels, batch)
        df["sil_labels"] = silhouette_label(X, labels)
        df["isolated_labels"] = isolated_labels(X, labels, batch)
        sc.pp.neighbors(adata, use_rep=emb_key_, n_neighbors=90)
        graph_dist = adata.obsp["distances"]
        df["ilisi"] = ilisi_knn(graph_dist, batch)
        df["clisi"] = clisi_knn(graph_dist, labels)
        df["pcr"] = pcr_comparison(X_pre, X, batch, categorical=True)
        return df
    
    emb_key = "X_scVI"
    scvi_metrics = compute_scib_metrics(adata, emb_key, "final_annotation", "batch", "scVI")
    
    opened by adamgayoso 0
  • param naming

    param naming

    Maybe we should use embedding and neighbor_distances and neighbor_connectivities explictly?

    Originally posted by @adamgayoso in https://github.com/YosefLab/scib-metrics/issues/30#issuecomment-1275449141

    opened by adamgayoso 0
Releases(0.0.8)
  • 0.0.8(Nov 18, 2022)

    What's Changed

    • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/YosefLab/scib-metrics/pull/48
    • Precompute and pass in pdists into kmeans init by @martinkim0 in https://github.com/YosefLab/scib-metrics/pull/49
    • kmeans_init to random by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/54
    • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/YosefLab/scib-metrics/pull/53
    • 008 release by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/56

    Full Changelog: https://github.com/YosefLab/scib-metrics/compare/0.0.7...0.0.8

    Source code(tar.gz)
    Source code(zip)
  • 0.0.7(Oct 31, 2022)

    What's Changed

    • Kmeans memory fix by @martinkim0 in https://github.com/YosefLab/scib-metrics/pull/45
    • Move PCR to utils module in favor of PCR comparison by @martinkim0 in https://github.com/YosefLab/scib-metrics/pull/46

    Full Changelog: https://github.com/YosefLab/scib-metrics/compare/0.0.6...0.0.7

    Source code(tar.gz)
    Source code(zip)
  • 0.0.6(Oct 25, 2022)

    What's Changed

    • Reimplement silhouette in a mem constant way, pdist using lax scan by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/42
    • 0.0.6 release by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/44

    Full Changelog: https://github.com/YosefLab/scib-metrics/compare/0.0.5...0.0.6

    Source code(tar.gz)
    Source code(zip)
  • 0.0.5(Oct 25, 2022)

    What's Changed

    • Cleanup silhouette variables by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/31
    • standardize docstrings by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/30
    • K-means++ by @martinkim0 in https://github.com/YosefLab/scib-metrics/pull/23
    • PC regression in Jax by @martinkim0 in https://github.com/YosefLab/scib-metrics/pull/16
    • Lax'd/mapped silhouette and squareform pdist by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/33
    • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/YosefLab/scib-metrics/pull/19
    • PCR comparison by @martinkim0 in https://github.com/YosefLab/scib-metrics/pull/38
    • Update cookicutter template for sync by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/35
    • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/YosefLab/scib-metrics/pull/37
    • Bump version 0.0.5 by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/40

    New Contributors

    • @martinkim0 made their first contribution in https://github.com/YosefLab/scib-metrics/pull/23
    • @pre-commit-ci made their first contribution in https://github.com/YosefLab/scib-metrics/pull/19

    Full Changelog: https://github.com/YosefLab/scib-metrics/compare/0.0.4...0.0.5

    Source code(tar.gz)
    Source code(zip)
  • 0.0.4(Oct 11, 2022)

    What's Changed

    • leiden nmi ari by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/24
    • LISI implementation by @justjhong in https://github.com/YosefLab/scib-metrics/pull/20
    • 0.0.3 -> 0.0.4 by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/29

    New Contributors

    • @justjhong made their first contribution in https://github.com/YosefLab/scib-metrics/pull/20

    Full Changelog: https://github.com/YosefLab/scib-metrics/compare/0.0.3...0.0.4

    Source code(tar.gz)
    Source code(zip)
  • 0.0.3(Oct 10, 2022)

    What's Changed

    • Improve kmeans accuracy with centering before distances by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/18
    • use sphinx book theme, myst-notebook by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/22
    • Add references, linkcode to docs, remove scanpydoc by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/25
    • fix linkcode github link by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/26
    • Fix linkcode again by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/27
    • Bump version: 0.0.2 → 0.0.3 by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/28

    Full Changelog: https://github.com/YosefLab/scib-metrics/compare/0.0.2...0.0.3

    Source code(tar.gz)
    Source code(zip)
  • 0.0.2(Oct 3, 2022)

    What's Changed

    • add release workflow, update changelog, pypi upload by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/15

    Full Changelog: https://github.com/YosefLab/scib-metrics/compare/0.0.1...0.0.2

    Source code(tar.gz)
    Source code(zip)
  • 0.0.1(Oct 3, 2022)

    Initial release!

    What's Changed

    • Jax Silhouette by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/1
    • move silhouette to utils by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/2
    • Update README.md by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/3
    • initial docs by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/4
    • silhouette batch + labels by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/5
    • add kmeans, nmi, ari metrics by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/6
    • Isolated labels + settings by @adamgayoso in https://github.com/YosefLab/scib-metrics/pull/7

    New Contributors

    • @adamgayoso made their first contribution in https://github.com/YosefLab/scib-metrics/pull/1

    Full Changelog: https://github.com/YosefLab/scib-metrics/commits/0.0.1

    Source code(tar.gz)
    Source code(zip)
Owner
Yosef Lab
Center for Computational Biology, Electrical Engineering and Computer Sciences @ UC Berkeley
Yosef Lab
SIMS: Scalable, Interpretable Models for Cell Annotation of large scale single-cell RNA-seq data

SIMS: Scalable, Interpretable Modeling for Single-Cell RNA-Seq Data Classification SIMS is a pipeline for building interpretable and accurate classifi

Julian Lehrer 2 May 12, 2022
IPython/Jupyter cell magic to execute code after a cell succeeds or fails

on-done on-done is an IPython/Jupyter cell magic to execute code after a cell succeeds or fails. It is especially useful if you have a lengthy computa

Parsiad Azimzadeh 2 Nov 17, 2022
Improving single-cell multi-omics data analysis based on graph attention networks

scMGAT scMGAT: Improving single-cell multi-omics data analysis based on graph attention networks scGMAI uses the following dependencies: python = 3.6

null 1 Aug 16, 2022
The official source code for "Deep single-cell RNA-seq data clustering with graph prototypical contrastive learning"

Deep single-cell RNA-seq data clustering with graph prototypical contrastive learning The official source code for "Deep single-cell RNA-seq data clus

JunSeok 5 Sep 29, 2022
Single-cell morphological analysis

scmorph Single-cell morphological analysis scmorph is a Python library to process CellPainting or any morphological data. It unlocks single-cell data

EdBiomedAI 3 Oct 4, 2022
OneID integration for Flask (only Uzbekistan)

Flask-OneID Only for Uzbekistan OneID integration for Flask application Links About OneID Examples How it Works Install pip install Flask-OneID Add y

Odya LLC 13 Jul 13, 2022
Paycom.uz integration for Flask application (only for Uzbekistan)

Flask-PaycomUz Only for Uzbekistan Paycom.uz integration for Flask Links About PaycomUz PaycomUz Docs PaycomUz Sandbox How it Works Install pip instal

Odya LLC 3 Nov 7, 2022
This is a Python program that can destroy the victim's cell phone.

Iranian-sms-bomber This is a Python program that can destroy the victim's cell phone. This app is written with Python and is an SMS bomber. This progr

null 1 Jul 21, 2022
∞-AE model's implementation in JAX. Kernel-only method outperforms complicated SoTA models with a closed-form solution and a single hyper-parameter.

Infinite Recommendation Networks (∞-AE) This repository contains the implementation of ∞-AE from the paper "Infinite Recommendation Networks: A Data-C

Noveen Sachdeva 41 Nov 14, 2022
Taichi Implementation of "The Power Particle-in-Cell Method"

Power-PIC Taichi Implementation of "The Power Particle-in-Cell Method" Running the Demo python3 main_2d.py parameters: usage: main_2d.py [-h] [--flip]

Chang Yu 10 Sep 22, 2022
Variation of Sudoku. Place integers from -4 to 4 into each cell so that each row, column, and region contains each integer from -4 to 4 exactly once

Variation of Sudoku. Place integers from -4 to 4 into each cell so that each row, column, and region contains each integer from -4 to 4 exactly once. The numbers outside the grid represent the sum of all integers between that clue and the zero in that row/column. Some cells inside the grid already contain an integer but no sign is given. It is part of the puzzle to determine if these values are positive or negative.

Felipe Becker dos Santos 1 Sep 3, 2022
Spatial region-related embedding and Cell type-related embedding of spatial transcriptomics.

SECE Spatial region-related embedding and Cell type-related embedding of spatial transcriptomics. Spatially resolved transcriptomics sequencing (ST-se

null 1 Sep 20, 2022
A python interface to k-wave GPU accelerated binaries

k-Wave-python This project is a Python interface to the k-Wave simulation binaries. The documentation can be found here Installation git clone https:/

Walter Simson 33 Nov 15, 2022
Python JSON benchmarking and correectness.

json_benchmark This repository contains benchmarks for Python JSON readers & writers. What's the fastest Python JSON parser? Let's find out. To run th

Tyler Kennedy 6 Nov 26, 2022
A simple benchmarking SEO tool written in Python.

SEO-Project A simple benchmarking SEO tool written in Python. The program extracts and compares the values of the sites you enter for you (the data is

Xrypt0 2 Nov 18, 2022
Benchmarking for dot-accessible dict packages in python

dotdict-bench Benchmarking for dot-accessible dict packages in python More test ideas? Submit an issue! Package Information As of 2022-09-21 23:11:19.

null 1 Sep 22, 2022
Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.

Optimum Transformers Accelerated NLP pipelines for fast inference ?? on CPU and GPU. Built with ?? Transformers, Optimum and ONNX runtime. Installatio

Aleksey Korshuk 112 Nov 19, 2022
Cython accelerated fANOVA implementation for Optuna.

optuna-fast-fanova optuna-fast-fanova provides Cython-accelerated version of FanovaImportanceEvaluator. n_trials n_params n_trees fANOVA (Optuna) fast

null 16 Oct 17, 2022
cuNumeric is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the NumPy API on top of the Legion runtime

cuNumeric cuNumeric is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the NumPy API on top of the Legion

Legate 496 Nov 16, 2022