Give a talk about our CarDEC method
A Joint Deep Learning Model Enables Simultaneous Batch Effect Correction, Denoising and Clustering in Single-Cell Transcriptomics
Recent development of single-cell RNA-seq (scRNA-seq) technologies has led to enormous biological discoveries. As the scale of scRNA-seq studies increases, a major challenge in analysis is batch effect, which is inevitable in studies involving human tissues. Most existing methods remove batch effect in a low-dimensional embedding space. Although useful for clustering, batch effect is still present in the gene expression space, leaving downstream gene-level analysis susceptible to batch effect. Recent studies have shown that batch effect correction in the gene expression space is much harder than in the embedding space. Popular methods such as Seurat3.0 rely on the mutual nearest neighbor (MNN) approach to remove batch effect in the gene expression space, but MNN can only analyze two batches at a time and it becomes computationally infeasible when the number of batches is large. Here we present CarDEC, a joint deep learning model that simultaneously clusters and denoises scRNA-seq data, while correcting batch effect both in the embedding and the gene expression space. Comprehensive evaluations spanning different species and tissues showed that CarDEC consistently outperforms scVI, DCA, and MNN. With CarDEC denoising, those non-highly variable genes (HVGs) offer as much signal for clustering as the HVGs, suggesting that CarDEC substantially boosted the amount of information content in scRNA-seq. We also showed that using CarDEC’s batch corrected gene expression as input for trajectory analysis revealed marker genes that are otherwise obscured in the presence of batch effect. CarDEC is computationally fast, making it a desirable tool for large-scale single-cell transcriptomics studies.