Personal Website

I am interested in developing statistical methods for multivariate and/or high-dimensional biomedical data from a wide range of applications:

1. Multivariate statistical methods for microbiome data. One primary focus of my research program is on statistical methods development for microbiome study. One research goal is to develop spatial point pattern analysis methods for understanding the spatial organization of microbes by using spectral imaging data. Another research goal is to develop comprehensive multivariate methods for microbiome sequencing count data. These methods differ from most commonly used techniques in that they involve analyzing the spatial/counts distributions of all microbial types as a joint endpoint distribution, instead of analyzing the univariate distribution of each type separately (taxon-by-taxon analysis). The overarching goal is to provide more robust and valid quantitative analysis tools to scientists in microbiology and bioinformatics.

2. Semi-competing risks framework for multivariate survival data. Semi-competing risks refers to the setting where interest lies in a nonterminal event (e.g. hospital readmission), the occurrence of which is subject to a terminal event (e.g. death). Although less known than competing risks, semi-competing risks problem arises in a broad range of public health applications. I have developed a novel hierarchical modeling framework for the analysis of clustered semi-competing risks survival data. The framework permits parametric or nonparametric specifications for a range of model components, including baseline hazard functions and distributions for key random effects, giving analysts substantial flexibility as they consider their own analyses. I am currently extending the method for various type of study designs to further expand the scope of scientific inquiry from clinical and public health science.

3. Survival analysis with high-dimensional genomic covariates. Developing a predictive model that relates the time-to-event outcome to high-dimensional genomic data is challenging because of i) high-dimensional genomic variables, the number of which often far exceeds the number of subjects, ii) structured grouping of genes, and iii) censored outcomes. I am developing statistical methods for correlated and structured high-dimensional genomic data with survival outcomes in the context of penalized regression models.