- This event has passed.
Quantitative Issues in Cancer Research Working Seminar
October 3 @ 1:00 pm - 1:50 pm
Doctoral Student, Department of Biostatistics, Harvard University
“Bayesian approaches to multi-study matrix decompositions of heterogeneous genomics datasets”
ABSTRACT: Analyzing multiple studies allows leveraging data from a range of sources and populations, but until recently, there have been limited methodologies to approach the joint unsupervised analysis of multiple high-dimensional studies. Recent methods can identify shared signals across datasets, as well as signals specific to particular groups. However, especially as the number of datasets grows, we expect the presence of signals with more complex sharing patterns. We propose two flexible Bayesian multi-study latent feature models to address this problem. The first is a combinatorial multi-study factor analysis method, which identifies latent factors that can be shared by any combination of studies. We model the subsets of studies that share latent factors with an Indian Buffet Process, and demonstrate our method’s utility not only in dimension reduction but also in covariance estimation. The second is an extension of this approach to multi-study non-negative matrix factorization, specialized to application in the characterization of mutational signatures from tumor genomes. We develop both fully unsupervised and semi-supervised approaches, which allows novel signatures to be discovered and known signatures to be recovered. Finally, we incorporate tumor-level covariates into the model to estimate associations with signatures, using a non-local spike-and-slab prior to enforce biologically plausible sparsity. We demonstrate both approaches in integrating multiple datasets from breast and colorectal cancer respectively.