Recent developments in stem cell biology have enabled studies of cell fate decisions in early human development that are impossible in vivo. However, we lack an understanding of how development varies across individuals and how common genetic variants influence development.
Here, we studied population variation of endoderm differentiation using full-transcript single-cell RNA-sequencing (Smart-seq2). Using a novel pooled experimental design that scales to large numbers of donors and ameliorates batch effects, we assayed cells at four time-points across 72-hour differentiations of human induced pluripotent stem cells (iPSCs) to definitive endoderm. Our experiments yielded data from 36,044 differentiating cells from 125 genetically distinct donors.
We ordered individual cells in pseudo-time and grouped cells into iPSC, mesendoderm and definitive endoderm stages. Using linear mixed models, we identified 1,833 eGenes (genes with at least one expression Quantitative Trait Loci: eQTL) in iPSCs, 1,702 eGenes in mesendoderm and 1,342 eGenes in definitive endoderm cells. Our time-course covers developmental stages that have never before been accessible to genetic analyses of molecular traits: 349 of our eQTL variants at the mesendoderm and definitive endoderm stages were not reported in a recent iPSC eQTL study or in the 49 tissues of the GTEx project. Further, we identified molecular markers that predict differentiation efficiency and hundreds of eQTLs that influence expression dynamically during differentiation.
Our combined analyses uncovered 785 time-dynamic eQTLs with four distinct classes of pseudo-temporal behaviour. Finally, allele-specific expression analyses revealed interactions between genotype, gene expression and cell state for fundamental cellular processes such as cell cycle and respiration, demonstrating that the cellular environment modulates genetic effects on gene expression.