Try explaining to your extended family that you are considered an expert in causal inference. That’s why, when people ask, I just say that my job is to learn what works for the prevention and treatment of diseases.
- “Oh, so you are a medical doctor?”
- Yes, but more to the point, I am an epidemiologist. The idea is to save lives in batches rather than one at a time.
- “Epidemiologist? I see, so you study Covid?”
- Yes, I do, among other things. Epidemiologists study all sorts of health issues: cancer, cardiovascular disease, birth defects, suicide… In fact, most epidemiologists don’t study epidemics of infectious diseases.”
- “Most epidemiologists don’t study epidemics?”
Here we go again. This paper of ours may help explain some components of my job.
Even when talking with professional researchers, we causal inferencers are up against some serious misunderstandings. On a few occasions, I have been invited to deliver lectures on “casual interference.” Less amusingly, a crucial problem is that many researchers are reluctant to acknowledge that their job involves asking causal questions, as if they were ashamed of using the c-word:
- Hernán MA. The C-word: Scientific euphemisms do not improve causal inference from observational data (with discussion). American Journal of Public Health 2018; 108(5):616-619.
- Hernán MA. The C-word: The more we discuss it, the less dirty it sounds. American Journal of Public Health 2018; 108(5):625-626.
Making valid causal inferences is challenging because it requires high-quality data and adequate statistical methods. My colleague Jamie Robins and I wrote a book that describes these methods and the conditions under which they can be used, which means that investigators need to be subject-matter experts to evaluate whether those conditions are met. That is, when trying to make causal inferences from observational data, it isn’t enough to be a brilliant data analyst; you also need to be a subject-matter expert. We explain here:
- Hernán MA, Hsu J, Healy B. Data science is science’s second chance to get causal inference right: A classification of data science tasks. Chance 2019; 32(1):42-49 (pdf here)
- Hernán MA. Spherical cows in a vacuum: Data analysis competitions for causal inference. Statistical Science 2019; 34(1):69-71 (pdf here)
Subject-matter knowledge is needed not only to answer causal questions, but also to ask them. A current debate is about which causal questions can and cannot be asked. Some of us argue that some causal questions that are often taken for granted (like “what is the effect of obesity on mortality?”) aren’t good scientific questions. Once we explained this problem by telling the story of a king who wanted the best for his subjects. For a more comprehensive discussion of this topic (and many references), take a look at
- Hernán MA. Does water kill? A call for less casual causal inferences. Annals of Epidemiology 2016; 26: 674-680. PMCID: PMC5207342
If you read the above papers, you will notice a recurrent idea: causal inference from observational data can be viewed as an attempt to emulate a (hypothetical) randomized trial: the target trial. (For more on the history of this idea, see this). We wrote some non-technical papers that review the concept of the target trial and explain how it can be used to avoid some common biases in observational analyses:
- Hernán MA, Wang W, Leaf DE. Target Trial Emulation: a framework for causal inference from observational data. JAMA 2022.
- Hernán MA. Methods of Public Health Research — Strengthening causal inference from observational data. New England Journal of Medicine 2021; 385:1345-1348.
- Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. American Journal of Epidemiology 2016; 183(8):758-764.
- Hernán MA. With great data comes great responsibility. Publishing comparative effectiveness research in Epidemiology [editorial]. Epidemiology 2011; 22(3):290-291.
Immortal time bias is one of the biases that can be eliminated by explicitly emulating a target trial. The first paper listed below teaches how to prevent immortal time bias with a very very simple example; the second explains the issues in more detail:
- Hernán MA. How to estimate the effect of treatment duration on survival using observational data. BMJ 2018; 360: k182.
- Hernán MA, Sauer BC, Hernández-Díaz S, Platt R, Shrier I. Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses. Journal of Clinical Epidemiology 2016; 79: 70-75. PMCID: PMC5124536
Another bias that is eliminated by emulating a target trial is the bias that appears when data analysts set the time zero of treatment months or years after a treatment strategy was initiated, that is, they use “prevalent users”. This bias, which we have documented in several settings (like here), is the most important bias you may have never heard of. For example, this bias played a key role in the debacle surrounding postmenopausal hormone therapy and heart disease. If you have ever heard me give a talk, chances are that I presented these findings:
- Hernán MA, Alonso A, Logan R, Grodstein F, Michels KB, Willett WC, Manson JE, Robins JM. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease (with discussion). Epidemiology 2008; 19(6):766-779. PMCID: PMC3731075
This article shows how incorrect specification of time zero can lead to misleading conclusions. Also, the story of this paper is somewhat unique in the scientific literature: Because the language of the paper was the result of complex negotiations among the co-authors over several years, nobody was completely happy with the final version (one of the co-authors even dropped his name from the paper after the journal accepted it). Therefore, some of the co-authors took the unprecedented step of writing separate commentaries about their own article, with some of the commentaries being quite critical! Jamie Robins and I viewed this as an opportunity to explain the contributions of the paper in a more uncompromising way:
- Hernán MA, Robins JM. Observational studies analyzed like randomized experiments: Best of both worlds. Epidemiology 2008; 19(6):789-792.
A couple of years later, I revisited a related issue here:
- Hernán MA. The hazards of hazard ratios. Epidemiology 2010; 21:13-15.
(Along similar lines, my colleague Mats Stensrud and I later argued that testing for proportional hazards is pointless and should be abandoned:
- Stensrud MJ, Hernán MA. Why test for proportional hazards? JAMA 2020; .
But I digress. Back to time zero.)
Other colleagues remained unconvinced about the virtues of setting time zero of follow-up at the start of the treatment strategy, which is what one would do when explicitly emulating a target trial. In my response to them, I explained why setting time zero at the start should be the default approach, and argued that deviations from this approach need to be carefully justified on a case-by-case basis:
- Hernán MA. Epidemiology to guide decision-making: moving away from practice-free research. American Journal of Epidemiology 2015; 182(10):834-839.
The target trial concept is also central to the Cochrane Collaboration’s ROBINS-I tool to assess risk of bias in observational studies. Speaking of risk of bias, if you are interested in our research to classify biases according to its structure, click here.