Harvard Chan Bioinformatics Core: 2018 Year-End Update

Happy holidays from the Harvard Chan Bioinformatics Core! It has been a busy and productive year at the Core. We were delighted to celebrate the promotions of Drs. Radhika Khetani, Rory Kirchner and Meeta Mistry to Research Scientists, and Kayleigh Rutherford to Bioinformatician II. We also welcomed Yufei Lin for the summer as a bioinformatics intern.

The Core provides bioinformatics consulting services to investigators at the Harvard Chan School and within the wider Harvard community through individual collaborations and ongoing relationships with Harvard Catalyst, the Harvard Chan-NIEHS Center for Environmental Health,  the Harvard Center for AIDS Research (CFAR), the Harvard Stem Cell Institute (HSCI), Harvard Medical School Tools and Technology (TnT) and the Harvard Fibrosis Network. This year our consultants supported 59 grant applications and 221 consult requests. We collaborated on 14 publications including manuscripts in journals such as Cell, Nature Methods, Nature Cell Biology, and PNAS.

We continued to offer our popular short workshops and in-depth courses designed to empower researchers to perform their own NGS analyses. We also continued providing assistance at the Chan School’s FAS RC helpdesk.

Topic Duration Times offered Sponsor / Partner
Introduction to R 1.5 – 2 days 5 HSCI and TnT
Introduction to RNA-seq and high-performance computing 3 days 3 HSCI and TnT
Introduction to ChIP-seq and high-performance computing 3 days 2 HSCI and TnT
Differential Gene Expression Analysis and R 1.5 – 3 days 3 HSCI and TnT
In-depth NGS data analysis course (long course)

(RNA-seq, ChIP-seq, Variant Calling)

2 days 1 HSCI and TnT
Differential Gene Expression Analysis and R 3 days 1 PQG (HSPH, Biostats)
RNA-seq data analysis with R and high-performance computing 6 days 1 Harvard Catalyst
Differential Gene Expression Analysis and R 3 days 1 Harvard Catalyst
Introduction to RNA-seq and high-performance computing 2 days 1 Genentech, Inc. (San Francisco, CA)
Best practices in RNA-seq experimental design and analysis 4 hours 1 Open Bioinformatics Foundation conference
Introduction to Omics (online course) 20 weeks 1 Harvard Catalyst

Other exciting news includes the launch of a DataCamp course on RNA-seq differential expression analysis developed by Dr. Mary Piper, and a Harvard-wide Research (Computing) Trainers group co-founded by Dr. Radhika Khetani.

We continued development of the bcbio platform with Harvard and industry partners. Bcbio is a python toolkit that provides best-practice pipelines for fully automated high throughput sequence analysis. Bcbio highlights from this year include:

  • significant improvements to our single cell analysis pipeline which enabled us to handle millions of single cells
  • expansion of bcbio to handle digital gene expression (DGE) data
  • extensions to our small RNA-seq pipeline to support UMI filtering of PCR artifacts, and integration of miRGe for miRNA quantification after removal of cross-mapping events
  • multiple methods for iterative calling of large germline populations, including the Broad Institute’s GATK4 and Illumina’s Strelka2
  • validation of Google’s DeepVariant neural network call for in depth analysis of germline samples
  • implementation of methods to deconvolute complicated tumor samples, including analysis of heterogeneity, tumor purity, ploidy, loss of heterozygosity, and allele based copy number changes

Our researchers also developed three new R packages that provide quality control, plotting tools and analysis of bulk RNA-seq (bcbioRNASeq, published in F1000), single cell RNA-seq (bcbioSingleCell) and small RNA-seq (bcbioSmallRna) data, as well as tools to unify miRNA outputs (mirGFF3) and describe isomiRs detected from small RNA-seq pipelines (mirtop). In these efforts, we continued to work with the community to develop best practices, benchmarks and interoperable workflow tools, through the GA4GH working groups, Open Bioinformatics Codefest events, NIH Data Commons Hackathons, and miRNA transcriptome open project co-led by Dr. Lorena Pantano.

Finally, to boost reproducibility and efficiency among our team members and to share with the bioinformatics community, our team began developing the Core’s Knowledgebase, a central resource unifying our guides, reports, scripts and tools. We also began sharing data with our collaborators through the HSCI Stem Cell Commons platform, an ongoing collaboration with Drs. Peter Park and Nils Gehlenborg at HMS.

We thank all our fellow researchers for making this an exciting and productive year. All the best for 2019!