Uncovering Developmental Lineages from Single-cell Data with Contrastive Poincaré Maps
Abstract
Single-cell RNA-sequencing (scRNA-seq) enables the study of hierarchical and branching patterns in organismic development at high resolution. Analyzing such data requires visualization and analysis tools that faithfully represent the deep, tree-like structures formed by developmental lineages. Popular Euclidean embedding methods, such as UMAP and t-SNE, as well as domain-specific approaches like PHATE, distort hierarchical relationships in low dimensions, leading to a decrease in performance with growing tree depth. Hyperbolic geometry, which can represent trees with high accuracy in low dimensions, provides a natural remedy. However, existing hyperbolic methods, such as Poincaré Maps (PM), lose accuracy in deeper trees and require extensive feature engineering and memory. We present Contrastive Poincaré Maps (CPM), a self-supervised hyperbolic encoder that leverages contrastive learning in hyperbolic space to efficiently learn robust low-dimensional representations from scRNA-seq data. On synthetic trees with up to 5 generations and 34,000 individuals, CPM cuts distortion by > 99% and requires 13-fold less memory relative to PM. We further demonstrate CPM’s utility on three biological case studies. CPM uncovers accurate hierarchies across 9 developmental stages in the mouse gastrulation dataset comprising 116,312 cells, disentangles global multi-lineage hierarchies in the chicken cardiogenesis dataset while preserving intra-lineage developmental trends, and enables sampling-densityinvariant hierarchical analysis in the mouse hematopoiesis dataset. By leveraging hyperbolic geometry in combination with contrastive learning, CPM delivers a scalable framework that preserves hierarchical dependencies in developmental lineages, accelerates exploratory data analysis and opens new avenues for biological insights into developmental processes using scRNA-seq data.
A preliminary version of a part of this work was presented at the ICLR Workshop on Machine Learning for Genomics Explorations (Bhasker et al., 2024).
Related articles
Related articles are currently not available for this article.