Corneliu Bodea

Corneliu Bodea
Brigham & Women’s Hospital, Harvard Medical School, Merck

Predicting functional noncoding variation: an unsupervised phenotype-informed approach

Corneliu Bodea, Adele Mitchell, Heiko Runz, Shamil Sunyaev

The functional annotation of human genetic variants plays a critical role in distinguishing truly functional variants at a locus of interest from the vast background of genetic variation. Tools such as PolyPhen or Spliceman can evaluate the pathogenicity of newly-identified variants that directly disrupt the coding sequence or splicing. However, our knowledge of noncoding regions is still improving, making a similar evaluation of the functional impact of noncoding variants more difficult. Large-scale efforts such as the Encyclopedia of DNA Elements (ENCODE) and the Roadmap Epigenomics Project have generated data from a wide range of assays across numerous cell types. Optimally integrating these different annotations is challenging: each variant is characterized by a multitude of often highly correlated features, and little is known about which of these features are more predictive of a particular variants functional consequences. When a specific phenotype is being studied, an additional challenge that arises is how to incorporate prior biological knowledge (such as relevant cell types) into the analysis. Here we present a novel computational framework, called Phenotype-Informed Noncoding Element Scoring (PINES), to evaluate the functional impact of individual noncoding variants by integrating diverse annotations from high-throughout experimental datasets together with prior phenotype-specific information. We show that PINES has a superior ability to identify functional noncoding variation compared to methods that do not include phenotype-specific knowledge such as CADD, GWAVA, or Eigen. In particular, PINES delivers excellent performance in fine mapping scenarios, allowing users to dissect pathogenic loci while avoiding the resource-intensive setup of traditional fine mapping studies. Based on these features, PINES provides a powerful tool for phenotype-centered prioritization of noncoding variants.