Cancer has long been understood as a somatic evolutionary process driven by genetic and epigenetic alterations. The accumulation of somatic mutations over time results in a clonal structure cancer cell populations. Bulk and, recently, single-cell DNA sequencing have been used to reconstruct the natural histories of cancer cell populations, for example by inferring mutational processes and clonal development. Despite having sophisticated methods for inferring clonal tree structure, finding functional differences between individual clones captured by gene expression profiles remains challenging to assay and largely unknown. Powerful new single-cell technologies help overcome this challenge by allowing us to capture somatic mutations and gene expression heterogeneity between clones in complex clonal cell populations. However, generally applicable methods for inferring the clone-of-origin of single cells that make use of both copy number alterations (CNAs) and single nucleotide alterations (SNAs) are yet to be established.
Here, we propose a hierarchical Bayesian model that assigns single cells to their clone-of-origin using both somatic CNA and SNA information from single-cell RNA sequencing data. In brief, the proposed method models the pattern of expressed variant alleles in single cells while taking into consideration possible scenarios in which the variant allele frequency of SNAs is affected by the copy number state of the region. The model jointly assigns cells to clusters/clones corresponding to somatic clones with (unknown) mutation states, infers clonal tree configuration and indicates the impact of CNA on variant allele frequencies. Variational inference is used as a computationally efficient method to estimate model parameters. Our method will be useful in situations where both CNA and SNA influence the development and behaviour of clonal cell populations. The ability to study gene expression profiles of cells at single-cell resolution will be valuable for improving understanding of clonal dynamics and intra-tumoral heterogeneity in many types of cancer.