Neyman Visiting Assistant Professor
Dept of Statistics, UC Berkeley
Thursday, March 8
EPI Library, Kresge 907
Overlap in Observational Studies with High-Dimensional Covariates
Causal inference in observational settings typically rests on a pair of identifying assumptions: (1) unconfoundedness and (2) covariate overlap, also known as positivity or common support. Investigators often argue that unconfoundedness is more plausible when many covariates are included in the analysis. Less discussed is the fact that covariate overlap is more difficult to satisfy in this setting. In this talk, I will discuss some recent results regarding the implications of overlap in high-dimensional observational studies, arguing that this assumption is stronger than investigators likely realize. In particular, the results show that strict overlap bounds discriminating information (e.g., Kullback-Leibler divergence) between the covariate distributions in the treated and control populations. These information bounds imply explicit bounds on the average imbalance in covariate means under strict overlap and a range of assumptions on the covariate distributions. Importantly, these bounds grow tighter as the dimension grows large, and converge to zero in some cases. The results suggest that adjusting for high-dimensional covariates does not necessarily make causal identification more plausible. To close, I will discuss two current lines methodological research motivated by this work: first, a set of procedures for statistically assessing the quality of population overlap in high-dimensional settings; and second, a set of procedures for reducing high-dimensional covariates in a way that improves overlap but maintains unconfoundedness.