Background
Meiotic recombination disrupts the shared pattern of inheritance of nucleotide sites, resulting in recombinant haplotypes and thousands of distinct genealogies across the genomes. As a corollary, regions where recombination has not occurred results in contiguous stretches of non-recombinant haplotypes (haploblocks) in which the relationships among samples is explained by a single genealogy. Identifying these haploblocks is pertinent in the study of human evolutionary and demographic history, as well as disease-association mapping. Linkage disequilibrium (LD) is the most common method of haploblock identification; however, LD can persist across recombination points leading to erroneous haploblocks containing recombinant haplotypes. Conversely, methods such as character-compatibility are a more conservative approach that tests if sites can share a genealogy. To determine if character-compatibility can more accurately identify haploblocks free of recombinant haplotypes in human autosomes, we applied a clique-based haploblocking algorithm to character-compatibility matrices previously applied to LD matrices.
Results and Conclusions
On chromosome 22 of a Han Chinese population, we identified 4,899 haploblocks from character-compatibility matrices, 3,297 from D′ matrices, and 2,454 and 1,892 from r2 matrices when the high LD threshold was set to r2>0.8 and 0.5, respectively. As a corollary, haploblocks identified from character-compatibility matrices were shorter (µ=5,167bp) than those identified from D′ and r2 matrices (µD′=8,186bp, µr2>0.8=11,511bp, and µr2>0.5=15,965). We tested each haploblock for the presence of recombinant haplotypes using the pairwise homoplasy index. 73% of haploblocks identified using character-compatibility matrices showed no significant evidence of recombinant haplotypes, compared to 41% using D′, 55% using r2>0.8, and 47% using r2>0.5. Moreover, more character-compatibility haploblocks show no significant evidence of recombination at larger sizes, compared to LD haploblocks. These findings suggest that our approach of using character-compatibility matrices in haploblock identification is more accurate than other existing approaches and will aid in downstream analyses of human genomes.