Poster Presentation 41st Lorne Genome Conference 2020

Identifying blocks free of recombinant haplotypes in human genomes using character-compatibility (#213)

Tim W McInerney 1 , Simon Easteal 1 , Hardip R Patel 1
  1. The Australian National University, Acton, ACT, Australia

Background
Meiotic recombination disrupts the shared pattern of inheritance of nucleotide sites, resulting in recombinant haplotypes and thousands of distinct genealogies across the genomes. As a corollary, regions where recombination has not occurred results in contiguous stretches of non-recombinant haplotypes (haploblocks) in which the relationships among samples is explained by a single genealogy. Identifying these haploblocks is pertinent in the study of human evolutionary and demographic history, as well as disease-association mapping. Linkage disequilibrium (LD) is the most common method of haploblock identification; however, LD can persist across recombination points leading to erroneous haploblocks containing recombinant haplotypes. Conversely, methods such as character-compatibility are a more conservative approach that tests if sites can share a genealogy. To determine if character-compatibility can more accurately identify haploblocks free of recombinant haplotypes in human autosomes, we applied a clique-based haploblocking algorithm to character-compatibility matrices previously applied to LD matrices.

Results and Conclusions
On chromosome 22 of a Han Chinese population, we identified 4,899 haploblocks from character-compatibility matrices, 3,297 from D′ matrices, and 2,454 and 1,892 from r2 matrices when the high LD threshold was set to r2>0.8 and 0.5, respectively. As a corollary, haploblocks identified from character-compatibility matrices were shorter (µ=5,167bp) than those identified from D′ and r2 matrices (µD′=8,186bp, µr2>0.8=11,511bp, and µr2>0.5=15,965). We tested each haploblock for the presence of recombinant haplotypes using the pairwise homoplasy index. 73% of haploblocks identified using character-compatibility matrices showed no significant evidence of recombinant haplotypes, compared to 41% using D′, 55% using r2>0.8, and 47% using r2>0.5. Moreover, more character-compatibility haploblocks show no significant evidence of recombination at larger sizes, compared to LD haploblocks. These findings suggest that our approach of using character-compatibility matrices in haploblock identification is more accurate than other existing approaches and will aid in downstream analyses of human genomes.

  1. Yoo, et al. (2015) Clique-Based Clustering of Correlated SNPs in a Gene Can Improve Performance of Gene-Based Multi-Bin Linear Combination Test. BioMed Research International 2015: 11. DOI: 10.1155/2015/852341
  2. Kim, et al. (2017) A new haplotype block detection method for dense genome sequencing data based on interval graph modeling of clusters of highly correlated SNPs. Bioinformatics 34(3): 388-397. DOI: 10.1093/bioinformatics/btx609
  3. Kim, et al. (2019) gpart: human genome partitioning and visualization of high-density SNP data by identifying haplotype blocks. Bioinformatics 35(21): 4419-4421. DOI: 10.1093/bioinformatics/btz308
  4. Slatkin (2008) Linkage disequilibrium – understanding the evolutionary past and mapping the medical future. Nature Reviews Genetics 9(6): 477-485. DOI: 10.1038/nrg2361
  5. Wall and Pritchard (2003) Haplotype blocks and linkage disequilibrium in the human genome. Nature Reviews Genetics 4(8): 587-597. DOI: 10.1038/nrg1123