(From the March 2015 Issue of Research Now)
Next generation sequencing (NGS) is a modern technology that allows scientists to quickly and cheaply sequence DNA, accelerating biological and clinical research and enabling the discovery of genetic variants that cause common diseases. As data from NGS is generated at an ever-increasing rate, processing and analyzing this data in an optimal and efficient manner has become a crucial priority in the research community. Now, investigators at Nationwide Children’s Hospital have developed Churchill, an analysis “pipeline” that fully automates the process and diminishes its time from weeks to hours. An article describing this ultra-fast, highly scalable software was published in the latest issue of Genome Biology.
“It took around 13 years and $3 billion to sequence the first human genome,” says Peter White, PhD, director of the Biomedical Genomics Core and principal investigator in the Center for Microbial Pathogenesis at The Research Institute at Nationwide Children’s. “Now, even the smallest research groups can complete genomic sequencing in a matter of days. However, once you’ve generated all that data, that’s the point where many groups hit a wall.”
According to Dr. White, after a genome is sequenced, scientists are then left with billions of data points to analyze before any truly useful information can be gleaned for use in research and clinical settings. To overcome the challenges of analyzing such a large amount of data, Dr. White and his research team developed Churchill, which uses novel computational techniques to efficiently analyze a whole genome sample in as little as 90 minutes.
“Churchill fully automates the analytical process required to take raw sequence data through a series of complex and computationally intensive processes, ultimately producing a list of genetic variants ready for clinical interpretation and tertiary analysis,” explains Dr. White, who is also an assistant professor of Pediatrics at The Ohio State University College of Medicine. “Each step in the process was optimized to significantly reduce analysis time, without sacrificing data integrity, resulting in an analysis method that is 100 percent reproducible.”
In comparison to other computational pipelines, validation using National Institute of Standards and Technology (NIST) benchmarks showed Churchill to have the highest sensitivity at 99.7 percent; the highest accuracy at 99.99 percent; and the highest overall diagnostic effectiveness at 99.66 percent. Additionally, by examining the computational resource use during the data analysis process, Dr. White’s team was able to demonstrate that Churchill was both highly efficient (>90 percent resource utilization) and scaled very effectively across many servers. Alternative approaches limit analysis to a single server and have resource utilization as low as 30 percent. This efficiency and capability to scale enables population-scale genomic analysis to be performed.
Of the results, Dr. White says, “Rapid diagnosis of monogenic disease can be critical in newborns, so our initial focus was to create an analysis pipeline that was extremely fast, but didn’t sacrifice clinical diagnostic standards of reproducibility and accuracy. Having achieved that, we discovered that a secondary benefit of Churchill was that it could be adapted for population-scale genomic analysis.”
After receiving an award from the Amazon Web Services (AWS) in Education Research Grants program to demonstrate Churchill’s capacity to preform population-scale analysis, Dr. White and his team successfully analyzed 1,088 whole genome samples in seven days and identified millions of new genetics variants using cloud-computing resources from AWS. This analysis of raw data was completed as phase one of the 1000 Genomes Project, an international collaborationto produce an extensive public catalog of human genetic variation, representing multiple populations from around the globe.
“At Nationwide Children’s, we have a strategic goal to introduce genomic medicine into multiple domains of pediatric research and healthcare,” says Dr. White. “Given that several population-scale genomic studies are underway, we believe that Churchill may be an optimal approach to tackle the data analysis challenges that these studies are presenting.”
The Churchill algorithm was licensed to Columbus-based GenomeNext LLC, which has built upon the Churchill technology to develop a secure and automated software-as-a-service platform that enables users to simply upload raw whole-genome, exome or targeted panel sequence data to the GenomeNext system. It then runs an analysis that not only identifies genetic variants but also generates fully annotated datasets, enabling filtering and identification of pathogenic variants. The company provides genomic data analysis solutions that simplify the process of data management and automate analysis of large-scale genomic studies. The system was also developed with the research and clinical market in mind, offering a standardized pipeline that is well suited to settings where customers have to meet regulatory requirements.
Kelly BJ, Fitch JR, Hu Y, Corsmeier DJ, Zhong H, Wetzel AN, Nordquist RD, Newsom DL, White P. Churchill: an ultra-fast, deterministic, highly scalable and highly balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics.Genome Biology 2015, 16:6. doi:10.1186/s13059-014-0577-x.