A huge congratulations to Assistant Professor of Biostatistics Rajarshi Mukherjee who received the NSF Career Award, one of the most prominent awards in support of early-career faculty who have contributed significantly to the academic community through cutting-edge research, excellent teaching, and the integration of education and research within the context of the mission of their organizations. See the announcement here and Rajarshi’s answers to a couple of questions about the award below.
Can you tell us about your proposed research?
Modern observational studies require the development of theoretically rigorous methods to conduct causal inference with complex data structures. Although this is a highly active field of research and has witnessed seminal contributions from scholars across various disciplines, some fundamental questions regarding the optimal adjustment of observed high-dimensional confounders remain open. Specifically, the last decade has witnessed the immense popularity of model-agnostic causal inference through a marriage of ideas from semiparametric theory and machine learning methods. However, a comprehensive understanding of fitting machine learning methods to optimize downstream causal inference in terms of statistical efficiency is still lacking. Whereas most research operates under a set of assumptions to achieve the above goal, the proposed research in this project aims to take a deeper dive into these very assumptions and provide a complete understanding of necessary and sufficient conditions to produce statistically efficient causal inference by tuning machine learning algorithms to optimize downstream inference instead of simply prediction performance. The specific aims of the projects are directly motivated by specific methods used by the research community and are also designed to shed light on the optimal choice of methods under the assumptions being made. What is the significance of the research for the field of Biostatistics? The field of biostatistics regularly deals with the subtleties of conducting valid causal inference with observational data. In this regard, research questions arising from electronic health record data, large-scale genetic studies, and studies designed to explore the effect of environmental pollution on human health are among the many examples that resonate with the unique challenges that researchers face while conducting valid causal inference by adjusting potentially a large number of confounders through the promise of resent machine learning toolboxes. Our project addresses some fundamental gaps in this literature and aims to paint a complete picture of optimal statistical causal inference both in nonparametric and high-dimensional settings. This in turn will allow researchers to choose efficient methods based on the specific data structures and assumptions they are willing to make based on the specific problems at hand. How will the award support your academic pursuits? In my academic journey, I have tried to learn about the various complexities of observational studies through the lens of high dimensional inference, non and semiparametric methods, and statistical learning theory. This award provides me with the unique opportunity to study the interplay between these areas under one overarching theme.