Donna Spiegelman

Professor of Epidemiologic Methods

Department of Epidemiology

Department of Biostatistics

677 Huntington Avenue
Kresge Building Room 806
Boston, MA 02115
617.432.1050
stdls@channing.harvard.edu

Research

One of the most troublesome obstacles to sound epidemiologic research is measurement error and misclassification of nutritional, occupational and environmental variables which are hypothesized as determinants of important health endpoints, such as breast and lung cancer incidence and mortality. When ignored, these errors bias our point and interval estimates of effect, and invalidate p-values of hypotheses tests. Often although not always, the bias is towards the null value, under-estimating the true exposure-disease relationship, and there can be a substantial loss of power in hypothesis tests. The pervasiveness and extensiveness of these exposure measurement and misclassification errors in epidemiologic research may explain much of the inconsistent and inconclusive results currently reported in the literature.

Much of my research is devoted to developing statistical methods for the design and analysis of epidemiologic studies which can produce unbiased, efficient point and interval estimates of effect. When "gold standard" technology exists for accurate exposure assessment, cost-efficient studies can be designed in which the exposure variable is validated using this technology in a subsample of the data. We have developed a methodology for calculating sample sizes for the main study and validation study in such a way as to minimize the overall study cost subject to fixed statistical power. When "gold standard" technology does not exist but the sample mean of replicate measurements validly estimate individual subjects' true values, we have investigated the relationship between the number of replicates per person taken in the reliability study and the power of the study to estimate the effect of interest, in several examples taken from the Framingham Heart Study.

Once such validation data, or reliability data, are available, statistical procedures can be developed to produce valid, efficient estimates. A measurement error and/or misclassification model can be fit to the data and maximum likelihood estimates can be obtained. These models often involve difficult integrals, and we have developed a numerical algorithm to calculate the required integral to specified accuracy for an important special case which arises frequently in epidemiologic research. In order to avoid the need for specialized, computationally intensive software as usually required for maximum likelihood estimation of measurement error and misclassification models, we have developed simple one-step methods to correct for measurement error which are appropriate for several common settings frequently arising in chronic disease epidemiology. Some of these methods have been illustrated in publications by our group: in an investigation pooling data from the seven large cohort studies of the relationship between dietary fat intake and breast cancer incidence, the study-specific and pooled relative risk estimate and confidence intervals were corrected for bias due to measurement error in average daily fat intake; and the validity of competing methods of measuring of obesity has been evaluated. To give investigators outside our group access to our methodology, we distribute user-friendly software which implements our methods.

Education

Sc.D., 1989, Harvard School of Public Health

Web Links

PowerPoint presentation - SER longitudinal design 2007 

PowerPoint presentation - "How to Use Data to Get the Right Answer", "...Answer: Invited Talk, SER, 2005, Toronto"

CV and software - Dr. Spiegelman's CV and software