Martin Hemberg

Martin Hemberg
Wellcome Trust Sanger Institute

SC3 – Consensus clustering of single-cell RNA-seq data

Cell type diversity is a defining characteristic of multicellular organisms. Traditionally, cell types are defined based on morphological features and surface markers. With the advancement of modern experimental tools, such as single cell RNA sequencing (scRNA Seq), it has become possible to acquire the full transcriptome of individual cells,  thereby making it possible to define cell types based on the similarity of expression profiles in a data driven manner. However, due to the large degree of variability in gene expression and the high dimensionality of the transcriptome, the clustering of cells based on their expression profiles remains a challenging problem. We present a novel method, Single Cell Consensus Clustering (SC3), for the unsupervised clustering of cells from scRNA Seq experiments. The key element of SC3 is the combination of results from several general purpose clustering methodologies to increase the accuracy of the classification. To remain computationally efficient, SC3 parallelizes the clustering steps and pools them together at the final step. In addition, SC3 can handle datasets with tens of thousands of cells (e.g. from Drop Seq experiments) by using Support Vector Machines. A key feature of SC3 is a user friendly interface which allows one to validate the clustering in real time, not only by objective criteria but also through direct visual examination. We have tested SC3 on various scRNA Seq data sets and we show that it can outperform other recently developed methods. SC3 also provides biological insights by identifying differentially expressed genes, marker genes and cell outliers (e.g. rare cell types) across the obtained cell clusters.