Predicting Unseen Variants in Human Genome
Friends, geneticists, computational biologists: Lend me your ears. A method once used to estimate how many more words Shakespeare knew, but did not write down, may now help genomics researchers more accurately predict the number of unseen genetic variants in the human genome.
The method can help scientists design better genome-wide and targeted sequencing studies to probe the contribution of rare genetic variants to health and disease, said first author Iuliana Ionita-Laza, an HSPH research associate. The findings were published in Proceedings of the National Academy of Sciences (PNAS).
The study comes at a time when genomics researchers are debating the next big strategic steps in genetic variation discovery projects. In the last few years, genome-wide association studies linking common diseases and genetic markers have identified important variants that provide insights into the underlying biological pathways.
But so far, the common loci identified have small effects and only explain a small percentage of the estimated heritability of disease. The method proposed by Ionita-Laza and her co-authors can be employed to find the less common genetic variations. “The question we address is how many unseen genetic variants are yet to be found,” she said.
As a bonus, the method can be used on the popular and less-expensive approach of analyzing the genetic markers known as SNPs and supports a cost-effective study design using fewer individuals.
In other benefits, finding more common and rare markers extends the reach of genome-wide association studies to investigate more of the genome.
Ionita-Laza validated her method in three public datasets, including 10 common genomic chunks from 48 unrelated DNA samples in what is known as the ENCODE project, the inflammatory response genes of 27 people in the SeattleSNPs database, and the sequences of 293 environmental response genes from 73 people. The sequence samples represented a cross-section of people of European, African, Hispanic, and Asian descent.
The method described in PNAS grew from early census studies in ecology more than 50 years ago to estimate the number of rare and common species in a geographic area.
“It’s the same kind of problem you have here,” said senior author and HSPH professor Nan Laird. “What you do is sample and re-sample. You don’t know what you’re looking for. So you look for overlaps. It’s the amount that is distinct that tells you how many more are out there that you don’t see in the individuals in hand.”
By the way, Shakespeare’s known works comprise 31,534 different words, about half of which he only used once. If a new work of equal size was discovered, the method found, it would contain about 11,460 new words the Bard had not previously used.
-- Carol Cruzan Morton
HPH NOW