Gene expression atlases, such as those described by FANTOM and other consortia have transformed our understanding of the cellular and molecular components of human tissues. As we move from tissue to cellular scale information, one challenge is adapting and integrating current knowledge frameworks to include new cell types, or better-defined cell states. For most atlases in the public domain, breadth of sampling comes at the cost of depth of replication. When integrating different types of transcriptome data, technical variables such as platform or batch can overwhelm biological signal. This is problematic for benchmarking new cell populations, isolation methods or platforms against existing knowledge commons.
Here, we demonstrate that it is possible to combine a large number of different profiling experiments, consisting of thousands of samples from >40 platforms, summarized from dozens of laboratories and representing hundreds of donors, to create a molecular map of human stem cells encompassing pluripotency and hematopoiesis. We achieve robust and unbiased cell type clustering using a variance partitioning method, selecting genes with low platform bias relative to biological variation. The method allows for rapid scaling and integration of new data, creating a resource for annotation and benchmarking of data across laboratories, protocols, cell models or methods of measurement.
The resulting atlas provides a multi-scaled approach to visualize and analyze the relationships between sets of genes and cell lineages. Projection of new data onto the atlas allows users to benchmark cell isolation or derivation methods, cell line models, and assess new cell activation states, and is available at https://www.stemformatics.org/atlas/blood