Wancen Mu

Wancen Mu
PhD Candidate, University of North Carolina – Chapel Hill

 

Machine learning based methods for predicting guide RNA effects in CRISPR epigenomic experiments

Cis-regulatory elements (CREs) are important in governing biological processes. CRISPR-Cas9-related systems have proven to be powerful tools for performing genetic or epigenetic perturbations, whose success relies heavily on the selection of efficient guide RNAs (gRNAs). Therefore, we aimed to make better predictions of functionally relevant gRNA in epigenome editing experiments. We leveraged XGBoost and Convolutional Neural Networks (CNN) to predict gRNA impact in three tasks: (1) Cell fitness in a whole-genome CRISPR-dCAS9-based epigenomic regulatory element screening (wgCERES) involving >1 million gRNAs in K562; (2) Expression of genes across multiple cell lines,; (3) Wild-type abundance indicating sufficient power for detecting gRNA effect. Ours is the first attempt to predict gRNA effects in CRISPR epigenetic experiments. We used both gRNA sequence and functional annotations as input features. We achieved the highest AUC of 0.82/0.74 for promoter/enhancer regions using separate models for task (1), and Spearman correlation of 0.774 and AUC of 0.835 if binarising the counts for task (3). By using Shapley Additive exPlanations (SHAP) value to rank feature importance in our models, we found functional annotations (H3K27ac, H3K4me3 and gRNA-DNA hybridization free energy) and sequence information were critical for promoter/enhancer regions separately.