A genome every 12 minutes

Isaac Kohane of Harvard Medical School speaking at the PQG conference

Conference on whole genome sequencing addresses advances in technology, ‘fake diseases,’ ancient DNA, and broad opportunities for human disease research

November 22, 2016 — In 2006, in the early days of whole genome sequencing, the sequencing platform at the Broad Institute of Harvard and MIT was able to sequence just three genomes—for an elephant, a tick, and a rabbit.

This year, the team is on pace to generate about 75,000 whole human genomes and will bring the total number of human exomes—the complete set of human genes—sequenced to over 250,000.

That’s a new genome every 12 minutes—a treasure of valuable data that is helping to shed light on genetic factors in human diseases.

The work at the Broad is part of an ongoing collaborative large-scale national whole genome sequencing effort. The Genome Sequencing Program (GSP) of the National Human Genome Research Institute (NHGRI) is sequencing 200,000 whole human genomes. The Trans-Omics Precision Medicine Program (TOPMed) of the National Heart, Lung and Blood Institute (NHLBI) is sequencing 100,000–150,000 whole human genomes.

“Such massive whole genome sequencing data present an unprecedented opportunity for genetic discovery in human diseases, with the most comprehensive capture of the human genome,” said Xihong Lin, chair of Harvard Chan School’s Department of Biostatistics and the contacting principal investigator of the NHGRI GSP Harvard Analysis Center.

The huge upswing in sequencing activity over the past few years was one of the topics discussed at the recent PQG (Program in Quantitative Genomics) Conference sponsored by Harvard T.H. Chan School of Public Health on November 3-4. Focused on whole genome sequencing—a process that determines the complete DNA sequence of an organism—the event highlighted opportunities and challenges in managing and analyzing vast amounts of genetic data and in using that data to advance medicine and public health.

Genomic data explosion

The number of genomes sequenced at the Broad Institute has doubled every seven months over the past decade, according to Stacey Gabriel, director of the Broad’s genomics platform, who gave one of the conference’s keynote addresses. She said that the Broad’s sequencing capability is in constant demand, spurred by research aimed at pinpointing genes that may be linked with cancer, neuropsychiatric diseases, rare diseases, or other types of disease.

Here’s how the process works: First, DNA samples—from participants in large epidemiological studies—are run through sequencing machines. Researchers analyze the resulting genetic data, looking for patterns that suggest which genes are associated with particular diseases. This information can then be used to predict who might be at risk for developing certain diseases, and to develop new prevention and treatment methods.

Given the explosion in demand for whole genome sequencing, the Broad is focusing on ways to sequence genomes more efficiently, figure out the best way to store and share the huge amounts of genetic data that are being generated every day, and reduce the costs of sequencing, Gabriel said.

The conference drew about 150 attendees

Gabriel was one of 17 speakers at the conference, which was held at Harvard Medical School’s Joseph B. Martin Conference Center. The event drew 150 attendees, including statistical geneticists, computational biologists, genetic epidemiologists, population geneticists, and clinical scientists.

‘Fake’ diseases

Isaac Kohane, professor and chair of the Department of Biomedical Informatics at Harvard Medical School, offered a cautionary note about over-reliance on genetic data in diagnosing patients. While a single genetic variation may suggest the presence of a particular disease, doctors must also consider other genetic or environmental influences and clinical information, Kohane said.

“We’re currently in serious danger of creating fake diseases, so that, as a result, we may unwittingly do harm, or less good than we hoped,” Kohane said.

By ‘fake’ diseases, Kohane meant that a diagnosis based too heavily on genetic information or, conversely, too heavily on clinical observations, could be misleading and potentially lead to problematic outcomes.

Diagnosing autism, for example, is tricky—the disorder can manifest itself in many different ways. Studies suggest both genetic and environmental factors as potential causes. Whatever the cause, a diagnosis of autism can have serious ramifications. For instance, Kohane recalled meeting an autism rights activist who worried about whether a genetic indicator of autism in a fetus might lead some parents to terminate a pregnancy.

On the other hand, a diagnosis of autism might be considered a good thing—for example, by parents who want a school system to pay for their child’s special education plan.

Because categorical diagnoses can have serious implications, Kohane said that he’d prefer less emphasis on strictly labeling diseases—even in the presence of genetic indicators—and more emphasis on defining “classes” of symptoms.

The power of SNPs

Several speakers at the conference discussed how genetic data is used to find genetic variations called single nucleotide polymorphisms—SNPs for short—that are linked with various diseases, including inflammatory bowel diseases, such as Crohn’s and colitis; ‘Mendelian’ diseases (those related to a single gene), such as cystic fibrosis, sickle cell anemia, or Huntington’s disease; and schizophrenia.

Others explained how DNA from ancient humans can help scientists better understand the evolution and biology of modern humans and discussed efforts to define the unique genetic architecture of particular populations, such as Ashkenazi Jews or people of African descent, who are prone to certain inherited disorders.

“It’s a really exciting time in the field of human genetics,” said Lin, coordinating director of the School’s Program in Quantitative Genomics and the principal investigator of the National Science Foundation conference grant funding this year’s PQG conference. “Just a few years ago, we had sequenced only a small number of people, because of the high cost of whole genome sequencing. But in order to really understand the genetic architecture of human diseases, we need to sequence the genomes of large numbers of people. Now, with new technology and lower costs, we are able to do so much more.”

Karen Feldscher

photos: Shaina Andelman