I am broadly interested in improving the reproducibility and robustness of conclusions drawn from high-dimensional genomic data, focusing on translational relevance. My methodological research towards this goal can be categorized into several main strategies:
Utilization of Formalin-Fixed, Paraffin-Embedded Tissues (FFPET) in genomic assays. Upwards of 20 million tissues per year are archived as FFPET. It is the preservation method used for virtually all routine clinical pathology tests, and is the only material available for patients with long-term clinical follow-up and prospective history; however, extensive RNA degradation makes profiling of such tissues extremely challenging. I have done extensive work in assessing new technologies for microRNA and mRNA profiling of FFPET, and development of Quality Control methods for these data. I was on the organizing committee for Emerging Technologies for Translational Bioinformatics: A Symposium on Gene Expression Profiling for Archival Tissues.
- The use of high-throughput transcriptome assays to develop predictive models of drug response and clinical outcome for cancer patients is now more than ten years old, and thousands of these datasets are now publicly available through databases such as the Gene Expression Omnibus (GEO). For any given prediction problem of importance, in particular for cancer research, multiple groups have undertaken independent studies with a similar goal. Individual studies suffer from the Curse of Dimensionality (or the p>>n problem), and can potentially suffer from systematic errors known as batch effects. I am interested in methods for combining multiple independent high-dimensional datasets with the specific goal of prediction in mind. The focus on prediction makes possible certain approaches which avoid introducing batch effects that can arise from directly combining data from independent laboratories.
- Improvement of the application of penalized regression methods for genomic data. The Ridge penalty, LASSO penalty, and more recently the Elastic Net, improve the performance of traditional regression for high-dimensional and collinear predictors. I have used simulation studies to address unresolved questions around optimal tuning of the Elastic Net, and to optimize different steps of the development of a penalized regression survival model, including the univariate pre-filter and repeated optimization of cross-validated partial log likelihood.