Measurements of gene expression are heavily influenced by the cellular composition of the sample analysed. A common way to quantify and, in turn, adjust for this variable is the use of deconvolution algorithms, which estimate a sample’s composition from its expression profile. Although a multitude of methods have been developed, it is unclear whether their performance is consistent across tissues, such as in the human brain which stands unique in its transcriptomic diversity, and the complexity of its cellularity.
Here, we carry out a comprehensive evaluation of the accuracy of deconvolution methods for human brain transcriptome data, and assess the tissue-specificity of our key observations by comparison with transcriptome data from human pancreas. We evaluate 22 transcriptome deconvolution approaches: 3 partial deconvolution methods, each applied with 6 different categories of cell-type signature data; 2 enrichment methods; and 2 complete deconvolution methods. We evaluated performance using in silico mixtures of single-cell RNA-seq data, mixtures of neuronal and glial RNA, as well as nearly 2,000 human brain samples.
Our results bring several important insights into the performance of transcriptome deconvolution. (a) Cell-type signature data has a stronger impact on brain deconvolution accuracy than the choice of method. In contrast, cell-type signature only mildly influences deconvolution of pancreas transcriptome data, highlighting the importance of tissue-specific benchmarking. (b) Partial deconvolution methods outperform complete deconvolution methods on human brain data. (c) The impact of cellular composition differences on differential expression analyses is tissue-specific, and more pronounced for brain than for pancreas.
Further, we develop a novel brain cell-type signature, MultiBrain, which integrates single-cell, immuno-panned, and single-nucleus datasets. We demonstrate that it achieves improved deconvolution accuracy over existing signatures. Deconvolution of transcriptome data from autism cases and controls using MultiBrain identified cell-type composition changes replicable across studies, and highlighted novel dysregulated genes in autism.