As more and more organizations start using Whole Genome Sequencing (WGS) data for deeper analysis and insights, the demand for WGS analysis is rising rapidly. It is common to processes hundreds of genomes in a week and there are predictions that analysis of 1000 genomes per week will become standard. But, there are large computational, infrastructural, and cost-related challenges in analyzing such a large number of genomes while maintaining the accuracy and fidelity of the analyses. To enable researchers to efficiently analyze trends in genomic data from entire populations, hospitals/clinics must be able to process large numbers of genomes every week at a reasonable time and cost, these challenges have to be overcome.
On the computing front, traditional general-purpose microprocessors (CPUs) no longer scale performance according to Moore’s Law, thus they will not be able to meet the computing demands of WGS. Many compute intensive domains have adopted compute accelerators in the form of commodity General Purpose Graphics Processing Units (GPGPUs) to scale performance at reasonable costs. GPUs are well suited for the parallel computation in genomic analysis and are already gaining a foothold in genomic analysis. Parabricks has developed a full suite of WGS secondary analysis software (GATK best practices, deepvariant, cnvkit, etc.) that is optimized for GPUs. Our production results demonstrate x30 to x50 times faster processing with GPUs compared to CPUs and users can process a 30x whole genome in under an hour using 8 GPUs. Another major advantage of GPUs is that they are general purpose so they can be used for other computing tasks and readily available in the majority of the computing centers.