HDSI Causal Seminar: Mike Baiocchi, Stanford University – 11/2

Mike BaiocchiHDSI Causal SeminarThursday, Nov 2 | 4:00-5:30pmRegister hereMike BaiocchiAssociate Professor, Department of Epidemiology and Population HealthStanford UniversityHow to tell the difference between machine learning and (bio)statistics

This talk will discuss a couple of studies: (i) a randomized trial to evaluate a sexual assault prevention program in Nairobi, Kenya and (ii) a remote detection operation to find and disrupt labor trafficking in the Amazon rainforest. These are both “data science” projects but they are wildly different in how they work. What makes them so different? For a long time in (bio)statistics we only had two fundamental ways of reasoning using data: warranted reasoning (e.g., randomized trials) and model reasoning (e.g., linear models). In the 1980s a new, extraordinarily productive way of reasoning about algorithms emerged: “outcome reasoning.” Outcome reasoning has come to dominate areas of data science, but it has been under-discussed and its impact under-appreciated. For example, it is the primary way we reason about “black box” algorithms.In this talk we will discuss its current use (i.e., as “the common task framework”) and its limitations. We will show why we find a large class of prediction-problems are inappropriate for this new type of reasoning. We will then discuss a way to extend this type of reasoning for use, where appropriate, in assessing algorithms for deployment (i.e., when using a predictive algorithm “in the real world”). We purposefully developed this new framework so both technical and non-technical people can discuss and identify key features of their prediction problem.