DBMI Open Insights – 3/8

Department of Biomedical Informatics – Open Insights Seminar Series
March 8 | 12:30pm | Lahey Room (5th Floor) | Countway Library

Shilpa Kobren
Ph.D. Candidate in Computer Science
Princeton University
Data-Driven Approaches for Uncovering Functional Variation in Protein Interactions

Proteins carry out a dazzling multitude of functions by interacting with DNA, other proteins and various other molecules within our cells. Together these interactions comprise complex networks that differ naturally across cells within an organism, across individuals in a population, and across species. Although such variation is critical for normal organismal functioning, mutations affecting protein interactions are also known to underlie a wide range of human diseases. In my talk, I will present novel computational approaches that explore the extent to which specific protein interactions vary across species, across healthy individuals, and across individuals with cancer. To start, I will focus on interaction variation across species. We developed and applied a comparative genomics framework to systematically quantify changes in protein-DNA interactions across closely related species. This work demonstrates that contrary to popular convention, functional gene regulatory divergence can stem from changes in non-duplicated DNA-binding proteins; such changes were previously believed to be largely detrimental. Next, I will turn my attention to interaction variation across individuals. First, to comprehensively identify interaction sites in human proteins, we combine large-scale sequence, domain and structure information to provide a biologically relevant assessment of per-position binding potential across protein sequences. This enables us to pinpoint sites involved in binding DNA, RNA, peptides, ions, metabolites, or other small molecules in 60% of human genes, representing the largest resource of this type to date. We show that whereas inferred interaction sites are significantly depleted of natural variants across ~60,000 healthy individuals, these same sites are significantly enriched for cancer mutations across ~11,000 tumor samples. In the last part of my talk, I show how we can exploit these opposing trends to uncover genes whose interaction interfaces are significantly altered in tumors. To this end, we develop a novel analytical framework that integrates our domain binding potentials with additional sources of data. Our method recapitulates known cancer driver genes with high precision as well as discovers perturbed molecular mechanisms in relatively rarely-mutated genes, thereby enabling valuable insights that may help guide personalized cancer treatments.