SEARCH CONTACT HOME CREDITS


Statistics Key Weapon in Gene Hunts

Many people appreciate that advances in molecular genetics have played a key role in our ability to discover genes that contribute to human disease. But less widely appreciated is that advances in statistical methodology have been a sine qua non of this era of gene discovery. The recent finding linking alpha-2-macroglobulin (A2M), a gene on chromosome 12, to Alzheimer's disease is a prime illustration of this often neglected, statistical side of the gene hunt story.

Alzheimer's disease is a complex disease that doesn't follow simple Mendelian inheritance patterns. Researchers, by convention, group it into two categories: early-onset disease, which occurs before age 60, and late-onset disease. Late-onset Alzheimer's is that much more complex because multiple genes and mutations are involved, probably acting interdependently and in interaction with environmental factors. That makes things difficult for geneticists. Until the discovery of A2M, only one gene had been identified for late-onset Alzheimer's.

Though many genetic researchers look for associations between genes and disease in people drawn from the population at large, in recent years association studies in families have come on strong. The notion is that family-based association studies will more reliably find heritable factors and be less prone to pick up on spurious associations that really don't have anything to do with genes. These family-based association studies have required DNA data from both parents and affected children. Each parent has two copies (not necessarily exact copies, however) of each DNA sequence but passes on only one of those sequences to a child. The ability to determine which of each parent's two DNA sequences has been "transmitted" to affected children, and which has not been transmitted, forms the basis for the statistical tests in family-based association studies.

But needing parental DNA has been a major roadblock for late-onset Alzheimer's researchers. The subjects are over 60, hence obtaining parental DNA is almost impossible. Indeed, Massachusetts General Hospital researchers had been frustrated in their attempt to link A2M to Alzheimer's. They had good evidence implicating A2M gene as a biochemical culprit in the disease process that leads to Alzheimer's. But the family-based data that they had helped assemble for the National Institute of Mental Health's (NIMH) Genetic Initiatives Study had virtually no parental DNA information, and the standard statistical tests found no evidence to suggest A2M played a role in Alzheimer's.

Then a wonderful bit of serendipity intervened. Marsha Wilcox, a postdoctoral student in the School's epidemiology department, was taking a course on genetic analysis last year and was using the NIMH genetic data set for her class project. Steven Horvath, a doctoral student in biostatistics and teaching assistant for that course, was working on developing a new method for family-based association analysis that depends on unaffected siblings instead of parents. The basic idea was to treat unaffected siblings as controls, the affected siblings as cases, and then to compare the average number of candidate genes in the two groups. If the candidate gene shows up more in the affected individuals, it would give us statistical evidence that the gene plays a role in the disease of interest. Working with Deborah Blacker, an assistant professor in the epidemiology department and a member of the MGH research team, myself, and other MGH researchers, Marsha and Steve applied this unaffected sibling approach to the A2M study. Bingo! We found that Alzheimer's patients were three to four times more likely to have a mutant version of A2M as their unaffected siblings. The results were announced in July at an international Alzheimer's disease meeting and published in August in Nature Genetics.

Once the human genome is mapped, we will know the sequence and location of all 80,000 to 100,000 human genes. Some people think this knowledge will put gene hunters and the statistical analyses on which they depend out of business. But as the A2M example shows, nothing could be further from the truth. The mapping of the genome will, in effect, produce thousands of candidate genes like A2M. If figuring out the role of just one single candidate gene for Alzheimer's disease meant developing a new kind of analysis and statistical approach, think how much work awaits us when the genomic map is complete. This work will have to include further advances in statistical methodology if we are to transform the wealth of genomic data into true understanding of the complex role genes play in the development of human disease.

-- Nan Laird, chair of the Department of Biostatistics

NEXT ARTICLE: Final Credits

 

The Harvard Public Health Review is published biannually by the Office of Development and Alumni Relations. To contact us with suggestions, comments, and questions, please e-mail: abenis@hsph.harvard.edu.

SEARCH CONTACT HOME CREDITS