Cutter Symposium 11/8/2013
Remarks by Donna Spiegelman on the Future of Epidemiology
I will address 3 topics: the role of adjusting for measurement error and misclassification in causal inference; implementation science; big data, ‘omics and the study of disease heterogeneity
- Measurement error and misclassification: 3 classic sources of bias – confounding, selection and information bias – most epi methods development over the past 50 years has focused on adjustment for confounding, from Mantel-Haenszel methods and matching, to multivariate regression models including the Cox model and the longitudinal model for repeated measures, to marginal structural models and g-causal models. Selection bias can also be adjusted for using these methods. Information bias occurs when key variables are measured with error or misclassified. Missing data is a potential cause of selection bias. Relative to the amount of methodologic research that has gone into developments for the control of confounding, much less has been done for measurement error, but even more to the point, the methods, which although fewer are quite well developed but are relatively infrequently used. Measurement error and misclassification can cause large biases and most epidemiologic studies have moderate to extensive measurement error and/or misclassification in outcome, confounders, exposure and/or mediators. It has been argued that measurement error is a bigger source of bias than confounding, at least in occupational and environmental epidemiology, but clearly we want to eliminate all sources of bias in order to be able to achieve our goal – causal inference. Measurement error tends to lead to loss of statistical power, false negatives, and spurious heterogeneity between studies, but it can also lead to bias away from the null and false positives. It has generally been the case that exposure measurement error is of the larger magnitude, compared to outcome misclassification. This is true for cancer and other chronic disease outcomes, but is less true when the outcome is, for example, a subtype, defined by molecular and OMIC data.
I hope that in the next 10 to 15 years there will be increased utilization of measurement error & misclassification methods in routine epidemiologic analysis. I have chosen this area as a major focus because of the great need, and it seems that after 20 years of working on the problem there is still more to be done, particularly with regard to dissemination. Our department is the only epidemiology department I know of that offers a course like EPI515, Measurement error and misclassification, which I’ve taught for the 4th time this fall. I invite others to join me in elevating the profile of information bias and methods to eliminate it within the standard toolbox of epi methods. Without doing so, our inference will not be causal.
- Translational epidemiology: Our department has a long and distinguished history in etiologic research on the causes of diseases, particularly of cancer, cardiovascular disease and diabetes, and especially the nutritional and environmental causes of these diseases. Many of us, myself included, are now interested, in this next period of time, in the translation of this large body of knowledge to practice, in order to more directly use epidemiologic knowledge to improve the public health. I have argued that as we go increasingly global, repeating similar etiologic epidemiologic research in every continent and country is not a good use of diminishing research resources. Examples of effect modification by genes, race and ethnicity are few and far between – cigarette smoking causes heart disease everywhere in the world, everywhere in the world obesity causes diabetes. And so on. In contrast, the success of public health interventions is likely to be deeply rooted in the cultural, social and economic context in which they are applied, and international variation in the effectiveness of specific interventions is consequently quite likely. This brings us to the emerging fields which have been called implementation science, translational science, and comparative effectiveness research, in which the goal of the research is to assess the effectiveness, rather than the efficacy, of interventions in real-world settings in large populations. Methodologies that have not typically been included in our epidemiologic toolbox come into play in these assessments, including qualitative research methods, which are critical for the preliminary design of interventions that are most likely to succeed, and which are critical for understanding why interventions fail at the implementation phase. Health economists have brought their expertise to bear in computations for cost-effectiveness, a fundamental metric for program evaluation. Neither qualitative methods nor methods for economic evaluation are taught in our department, nor is their mastery required of our students. Although etiologic research in chronic disease epidemiology has not yet approached its limits, from the knowledge already acquired, many promising prevention strategies are waiting to be evaluated. I suggest that the future of epidemiology lies, in part, in translation of the vast body of knowledge we have already produced into implementable interventions whose effectiveness and cost-effectiveness can be evaluated as a matter of routine course around the world, as they are rolled out.
- OMICs, BIG Data, molecular epidemiology, molecular pathological epidemiology, and the study of disease heterogeneity: Perhaps it is unfair to lump all of these topics together, and each one could easily merit 4 minutes worth of remarks in itself. OMICS include germline genomics, the microbiome, metabolomics, and tumor gene expression genomics. Perhaps there are others haven’t mentioned here. So far, I see the first 3 – germline genomics, microbiomics and metabolomics as exposures and the last one, tumor gene expression genomics, as a high-dimensional potential outcome. After 20 and 30 years of follow-up among large study populations, our studies are already quite big and running non-standard models can max out our existing computer hardware and software. Adding OMICS data into the mix increases the volume of the data by several more orders of magnitude. The analysis of these data require methodologies that in which most epidemiologists and most biostatisticians have not been well trained. I am concerned that the data scientists in collaboration with basic scientists developing these methods are out of touch with the underlying scientific questions of interest in epidemiology, and that substantial information may be lost as a result. For example, it is typical in pre-processing algorithms currently in use for the analysis of tumor gene expression data that genes whose expression is below a certain threshold are eliminated from further consideration. Perhaps it is the very lack of expression that is the actual cause of the growth of the tumor! A convenient tool for the reduction of the very high dimensionality of these data is through the imposition of a priori pathways or groupings onto the data, perhaps allowing some uncertainty in this a priori structure. In addition to these approaches, I believe there is so much that is unknown that it is essential to take agnostic approaches as well, and let the data guide us to the patterns and structures that will reveal themselves through rigorous analysis and analytic methods, based on assumptions that match the aims of the research. The integration of OMICS data into epidemiology is well underway, and there is tremendous knowledge to be gained as a result. Epidemiologists need to understand the assumptions made by the methods currently in use, and rise to equal partnership in the development of new methods and the appropriate application of existing methods. In the OMICS/Big Data world, to the extent that the tail is wagging the dog, we all stand to lose.