Harvard Chan Bioinformatics Core – 2019 Update

Happy holidays from the Harvard Chan Bioinformatics Core, now located on the 3rd floor of the Landmark Center in Fenway!

We were thrilled to welcome seven new members to our team. Drs. Zhu Zhuo, James Billingsley, Sergey Naumenko, Joon Yoon, Jihe Liu and Preetida Bhetariya were recruited as Research Associates, with diverse experience in bulk and single cell transcriptomics, long read sequencing and variant analysis in numerous biological contexts, including cancer genomics and HIV research. Maria Simoneau also joined us as our new Project Manager – a welcome addition to help us all stay on track. Though we were sad to say goodbye, we were also excited for our team members who moved on to pursue new opportunities. Dr. Brad Chapman joined Ginkgo Bioworks as Principal Data Science Architect and Dr. Lorena Pantano is now a Senior Computational Biologist at eGenesis. Brad and Lorena made many important contributions to research, infrastructure development, and mentoring during their years at the Core. Kayleigh Rutherford, who was an excellent summer intern that we recruited later as a bioinformatics analyst, currently works as a Computational Biologist at the Memorial Sloan Kettering Cancer Center. We wish all of them the best in their new positions. This year also marked the fifth anniversaries of Drs. Meeta Mistry, Mary Piper and Radhika Khetani at the Harvard Chan School, who together developed and implemented the core’s highly successful bioinformatics training program. They have made outstanding contributions to education and research in the Harvard community. Last but not least, we congratulated Drs. John Hutchinson and James Billingsley on their promotions to Senior Research Scientist and Research Scientist, respectively. 

Our team, now comprised of 15 individuals, provides bioinformatics consulting and training to investigators at the Harvard Chan School and across the wider Harvard community, with a focus on applications of next-generation sequencing (NGS). Through individual collaborations and ongoing relationships with Harvard Catalyst, the Harvard Chan-NIEHS Center for Environmental Health, the Harvard Center for AIDS Research (CFAR), the Harvard Stem Cell Institute (HSCI), Harvard Medical School (HMS), and the Harvard Fibrosis Network, we supported 58 grant applications and 227 consult requests this year. We explored analysis methods for complex single cell transcriptomics studies this year, harmonizing data across multiple technologies, replicates, batches and conditions, and evaluated approaches for differential expression analysis. An increase in demand for ATAC-seq and ChIP-seq analysis led to improvements to our workflows, and we began thinking about extensions to accommodate single cell ATAC-seq data. We implemented a whole-genome bisulfite-seq pipeline based on bismark2 for DNA methylation analysis with the Illumina Truseq Methyl Capture platform to support collaborations within the Harvard Chan School, and worked with clinical researchers to interpret their exome-seq variant studies. We co-authored 15 publications and were acknowledged in nine manuscripts, including papers published in Cell, Nature, Nature Biotechnology, Nature Neuroscience, Cell Reports, Circulation, Cancer Discovery,  and PNAS. 

We were delighted to congratulate Dr. Rory Kirchner for receiving a Chan Zuckerberg Initiative Essential Open Source Software for Science grant to support the development of the core’s bcbio-nextgen platform. Bcbio-nextgen is an open source python toolkit that provides best-practice pipelines for fully automated high throughput sequence analysis. These funds will be used to maintain and develop bcbio’s variant calling and single cell functionality, as well as to improve documentation and further engage the bcbio community of users. Additional infrastructure highlights from this year include updating the code base to meet the Python3 standard, supporting background inputs for copy number variant (CNV) calling to allow for a pre-computed panel of normals for tumor-only or single sample variant calling, implementing tumor-only variant calling with duplex barcodes, and making it multiple orders of magnitudes faster to set up runs with thousands of inputs. For bulk transcriptomics data, we added support for gene fusion calling with Arriba, set up the hg38 reference in STAR, and enabled 2-pass STAR alignment, which performs sequential alignment, genome indexing and re-alignment to improve quantification of novel splice junctions. Notably, we also resolved more than 500 issues on the bcbio github!

Our bioinformatics training program provided 30 workshops focused on basic data analysis skills and NGS analysis, spanning 42 training days and training over 900 researchers. For the third year, members of the training team were involved in the development of the online “Introduction to Omics Research” course offered by Harvard Catalyst. We continued to collaborate with Harvard’s FAS Research Computing “RC” group to make high performance computing more accessible to School researchers, and launched a monthly Bioinformatics Breakfast community event. We also explored new models for training and collaborating, including embedding core bioinformaticians within research labs to train lab members in private group settings and to oversee their analyses, and mentoring students from our training program to perform bioinformatics consulting within the community. 

This year has been filled with new opportunities, change, and compelling science! We thank all of our fellow researchers for an exciting and productive year. All the best for 2020!