Andrew Beam
Primary Faculty

Andrew Beam

Assistant Professor of Epidemiology


Other Positions

Assistant Professor of Biomedical Informatics

Biomedical Informatics

Harvard Medical School


Andrew Beam, PhD is an assistant professor in the Department of Epidemiology at the Harvard T.H. Chan School of Public Health, with secondary appointments in the Department of Biomedical Informatics at Harvard Medical School and the Department of Newborn Medicine at Brigham and Women's Hospital. His research develops and applies machine-learning methods to extract meaningful insights from clinical and biological datasets, and he is the recipient of a Pioneer Award from the Robert Wood Johnson Foundation for his work on medical artificial intelligence.

Previously he was a Senior Fellow at Flagship Pioneering and the founding head of machine learning at VL56, a Flagship-backed venture that seeks to use machine learning to improve our ability to engineer proteins.

He earned his PhD in 2014 from N.C. State University for work on Bayesian neural networks, and he holds degrees in computer science (BS), computer engineering (BS), electrical engineering (BS), and statistics (MS), also from N.C. State. He completed a postdoctoral fellowship in Biomedical Informatics at Harvard Medical School and then served as a junior faculty member.

Dr. Beam's group is principally concerned with improving, stream-lining, and automating decision-making in healthcare through the use of quantitative, data-driven methods. He does this through rigorous methodological research coupled with deep partnerships with physicians and other members of the healthcare workforce. As part of this vision, he works to see these ideas translated into decision-making tools that doctors can use to better care for their patients.

For more information, please see his group's website at


We are developing deep learning models to equip neonatologists with modern predictive tools to help them better understand and care for their patients. Infants born prematurely (before 37 weeks of gestation) experience very high levels of morbidity and are among the most expensive patients in all of pediatrics. NICU infants generate a tremendous amount of high-signal, multimodal data as part of their clinical care, but this data is currently under-utilized to inform decision-making.

These modalities are ones where deep learning has had tremendous success to date (e.g. imaging, text), thus there is an opportunity to create highly accurate predictive models for proactive decision-making. Specifically, we are interested in developing models in the following areas:

Convolutional models for NICU imaging data including x-rays, ROP screens, and ultrasounds.
Recurrent and transformer models for admission, progress, and discharge notes.
Recurrent and transformer models of real-time monitoring data.
Longitudinal disease trajectories built using large administrative databases.
We are extremely interested in developing new techniques that combine two or more of the above modalities to enable "pan-diagnostic" capabilities for NICU patients. Beyond model development, we are very committed to translational research to better understand how these models can be implemented in clinical work flows in a natural, easy-to-use manner.


A large portion of the world's medical knowledge is in unstructured sources such as textbooks, websites, and biomedical journal articles. We are developing a large-scale natural language processing (NLP) and natural language understanding (NLU) system capable of extracting general medical and diagnostic principles from unstructured medical text. For this project, we have created a unique collection of biomedical texts containing of 4.3 million articles, 50,000 pages of reference material, 15,000 flash cards, dozens of medical text books, and 20,000 multiple choice medical questions.

Using this data, we are creating models that can perform a broad range of medical reasoning tasks such as providing a differential diagnosis on the basis of a short textual description and answering complex medical questions posed in natural language. This work starts with current state of the art NLP/NLU/QA models based on deep learning, but seeks to extend them with explicit forms of symbolic reasoning and other less traditional computational models that are not currently in vogue.


Deep learning has had tremendous success in medicine. However, despite these successes deep learning models are in fact brittle and there are classes of problems that are not solvable by deep learning, even in principle. Moreover, at least in its current framing, nearly all of modern machine learning techniques are designed to give predictions, but what doctors often want are decisions. This necessitates moving beyond simple correlation-based models towards ones with richer understanding of the world, and are capable of understanding the effects of interventions.

In collaboration with our colleagues in causal inference group at HSPH, we are exploring the interface of machine learning and causal inference methods. This is a new, but very active, area of research and we are excited what new questions can be answered as machine learning models are imbued with a causal understanding of the world.



Op-ed: The future of artificial intelligence in medicine

A new artificial intelligence chatbot has the potential to transform the future of medical diagnosis, according to a February 13 op-ed in STAT co-authored by Andrew Beam, an assistant professor of epidemiology at Harvard Chan School.