A critical assessment of clustering algorithms to improve cell clustering and identification in single-cell transcriptome study

作者全名:"Liang, Xiao; Cao, Lijie; Chen, Hao; Wang, Lidan; Wang, Yangyun; Fu, Lijuan; Tan, Xiaqin; Chen, Enxiang; Ding, Yubin; Tang, Jing"

作者地址:"[Liang, Xiao; Cao, Lijie; Chen, Hao; Wang, Lidan; Chen, Enxiang; Tang, Jing] Chongqing Med Univ, Sch Basic Med, Chongqing, Peoples R China; [Fu, Lijuan] Chongqing Med Univ, Chongqing, Peoples R China; [Ding, Yubin] Chongqing Med Univ, Women & Childrens Hosp, Chongqing, Peoples R China; [Chen, Enxiang] Chongqing Med Univ, Sch Basic Med, Chongqing 400016, Peoples R China; [Ding, Yubin; Tang, Jing] Chongqing Med Univ, Women & Childrens Hosp, Dept Obstet & Gynecol, Chongqing 401147, Peoples R China"

通信作者:"Chen, EX (通讯作者),Chongqing Med Univ, Sch Basic Med, Chongqing 400016, Peoples R China.; Ding, YB; Tang, J (通讯作者),Chongqing Med Univ, Women & Childrens Hosp, Dept Obstet & Gynecol, Chongqing 401147, Peoples R China."

来源:BRIEFINGS IN BIOINFORMATICS

ESI学科分类:COMPUTER SCIENCE

WOS号:WOS:001173375300075

JCR分区:Q1

影响因子:9.5

年份:2024

卷号:25

期号:1

开始页: 

结束页: 

文献类型:Article

关键词:single-cell RNA sequencing; clustering algorithms; deep learning; performance evaluation; cell identification

摘要:"Cell clustering is typically the initial step in single-cell RNA sequencing (scRNA-seq) analyses. The performance of clustering considerably impacts the validity and reproducibility of cell identification. A variety of clustering algorithms have been developed for scRNA-seq data. These algorithms generate cell label sets that assign each cell to a cluster. However, different algorithms usually yield different label sets, which can introduce variations in cell-type identification based on the generated label sets. Currently, the performance of these algorithms has not been systematically evaluated in single-cell transcriptome studies. Herein, we performed a critical assessment of seven state-of-the-art clustering algorithms including four deep learning-based clustering algorithms and commonly used methods Seurat, Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL) and Single-cell consensus clustering (SC3). We used diverse evaluation indices based on 10 different scRNA-seq benchmarks to systematically evaluate their clustering performance. Our results show that CosTaL, Seurat, Deep Embedding for Single-cell Clustering (DESC) and SC3 consistently outperformed Single-Cell Clustering Assessment Framework and scDeepCluster based on nine effectiveness scores. Notably, CosTaL and DESC demonstrated superior performance in clustering specific cell types. The performance of the single-cell Variational Inference tools varied across different datasets, suggesting its sensitivity to certain dataset characteristics. Notably, DESC exhibited promising results for cell subtype identification and capturing cellular heterogeneity. In addition, SC3 requires more memory and exhibits slower computation speed compared to other algorithms for the same dataset. In sum, this study provides useful guidance for selecting appropriate clustering methods in scRNA-seq data analysis."

基金机构:National Natural Science Foundation of China; Supercomputing Center of Chongqing Medical University

基金资助正文:We are sincerely grateful to the Supercomputing Center of Chongqing Medical University for its strong support in the calculation of large memory data in this article.