Methods development to examine diet heterogeneity in Diverse populations

These are a set of projects focused on methods development and applications tailored for populations identified to be at greatest risk for negative cardiometabolic outcomes. A major methodological challenge of dietary intake analysis is handling the large volume of food components queried on a given dietary assessment, coupled with the large sample size of the target population. This is of particular concern in large, diverse populations where dietary habits can differ by culture, geography, and access. Through the integration and development of Bayesian nonparametric model-based clustering techniques, we aim to create methods that will handle the high-dimensionality of diet and population size, as well as highlight unique features found in subpopulations often understudied in population-based nutrition analysis.

With a focus on populations of greatest risk for CVD (e.g. women, low-income, racial/ethnic minorities), we prioritize our methods applications on widely used population-based studies in nutrition and cardiovascular disease research to provide new insights on the potential effects of dietary consumption behaviors towards the development of CVD disparities.

  1. Stephenson B, Willett W. (2023). Racial and Ethnic Heterogeneity in Diet of Low-income Adult Females in the United States: Results from National Health and Nutrition Examination Surveys 2011-2018. American Journal of Clinical Nutrition. 2023; 117(3): 625-634.

    Stephenson BJK, et al. Robust clustering with subpopulation-specific deviations. JASA 2020; 115(530):521-537.

  2. Stephenson B, Willett W. (2023). Racial and Ethnic Heterogeneity in Diet of Low-income Adult Females in the United States: Results from National Health and Nutrition Examination Surveys 2011-2018. American Journal of Clinical Nutrition. 2023; 117(3): 625-634.
  3. Stephenson BJK, Willett WC. Racial/Ethnic Heterogeneity in Diet of Low-Income Adult Women in the United States: Results from National Health and Nutrition Examination Surveys. 2011-2018. medRXiv:2022.04.06.22273539v1
  4. De Vito R*, Stephenson B*, et al. Shared and ethnic background site-specific dietary patterns in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). medRXiv preprint medRXiv: 06.30.22277013v1; doi: 10.1101/2022.06.30.22277013.
  5. Stephenson BJK, Herring AH, Olshan AF. (2022). Derivation of maternal dietary patterns accounting for regional heterogeneity. Journal of the Royal Statistical Society. Series C, Applied Statistics, 71(5), 1957–1977.

Bayesian Model-Based Clustering for Nationally Representative Survey Data

Data sourced from nationally representative surveys add complexity to statistical analyses. A single major demographic group can overshadow and dominate pattern details. Unequal probabilities of selection and response inherent in the sample design make standard statistical methods inadequate for smaller demographic subgroups. Bayesian nonparametrics provide an efficient and stable solution to manage the complexity of high-dimensional exposures and overwhelmingly large sample sizes, while preserving model stability. Our research is working to develop extensions to commonly used Bayesian supervised and unsupervised methods to account for survey design and sampling variability. This integration of survey methodology and Bayesian statistics will provide researchers tools to improve population-based inference for representative survey data.

We are currently focused on leveraging these methods on two commonly used nationally representative survey cohorts: (1) Hispanic Community Health Study/Study of Latinos (HCHS/SOL), a multi-center epidemiologic study in Hispanic/Latino populations residing in the United States and (2) National Health and Nutrition Examination Survey (NHANES), a population-based survey designed to assess the health and nutritional status of adults and children in the United States.

  1. Stephenson BJK, Dominici F. Identifying dietary consumption patterns from survey data: A Bayesian nonparametric latent class model. medrXiv 2021.11.18.21266543; (Code)

Addressing Reproducibility in Model-based Clustering

Dimension reduction methods are frequently applied towards high-dimensional data of varying types. However, these techniques tend to yield results specific to the study population. Under a Bayesian framework, clustering models are sensitive to hyperparameter selection and estimation technique employed, which can greatly affect the results in latent variable models.  My research continues to explore ways to improve the sampling and identifiability of these models towards the practical applications of population-based biostatistical problems.

A multifactorial examination of health disparities in cancer patients

Racial disparities have long been reported in cancer patients, with a disproportionate impact on low-income and racial minority subpopulations. The cause of these disparities is multifactorial, including biological, social and systemic factors. This collection of projects aims to fully examine how each of these factors contribute separately and collectively towards the widening disparities for cancer patients. This research involves the use of multiple data sources including microRNA, Census-tract attributable socioeconomic status, and cancer registry databases to unpack who, where, and why some patients receive optimal care over others. Through the implementation of model-based clustering and classification techniques, we aim to provide greater insight into the systematic barriers that contribute to poorer outcomes amongst those at greatest risk. This collaborative research is currently being applied to better address the needs of pancreatic, endometrial, and ovarian cancer patients.

Impact of Changing Restaurant Advertising on Weight Gain and Disparities

Evidence suggests that the food environment is associated with obesity risk, which affects one third of adults. With restaurants comprising an important component in the food environment, this collaborative work aims to leverage national data from the 100 top revenue generating U.S. restaurant chains from 2012-2016 to provide a much-needed understanding of how exposure to changes in local restaurant advertising impacts adult weight gain and disparities.