Professor of Biostatistics
Department of Biostatistics
Xihong Lin’s Group Website | Program in Quantitative Genomics’ Website
Xihong Lin is Professor and Former Chair of the Department of Biostatistics, Coordinating Director of the Program in Quantitative Genomics at the Harvard T. H. Chan School of Public Health, and Professor of the Department of Statistics at the Faculty of Arts and Sciences of Harvard University, and Associate Member of the Broad Institute of Harvard and MIT.
Dr. Lin’s research interests lie in the development and application of scalable statistical and machine learning methods for the analysis of massive data from the genome, exposome and phenome, including big and complex genetic and genomic, epidemiological and health data. Some examples of her current research include analytic methods and applications for large scale Whole Genome Sequencing studies, biobanks and Electronic Health Records, techniques and tools for whole genome variant functional annotations, analysis of the interplay of genes and environment, multiple phenotype analysis, polygenic risk prediction and heritability estimation. Additional examples include integrative analysis of different types of data, Mendelian Randomization, causal mediation analysis and causal inference, federated and transferred learning, single cell genomics, analysis of epidemiological and complex observational studies, and analysis of COVID-19 epidemic data. Dr. Lin’s theoretical and computational statistical research includes statistical methods for testing a large number of complex hypotheses, causal inference, statistical and ML methods for large matrices, prediction models using high-dimensional data, federated and transferred learning, cloud-based statistical computing, and mixed models, nonparametric and semiparametric regression, and statistical methods for epidemiological studies.
Dr. Lin’s statistical methodological research has been supported by the MERIT Award (R37) (2007-2015), the Outstanding Investigator Award (OIA) (R35) (2015-2029) from the National Cancer Institute (NCI), the R01 grant from the National Heart, Lung, and Blood Institute. She is the multiple PI of a Predictive Modeling Center of the Impact of Genomic Variation on Function (IGVF) Program of the National Human Genome Research Institute (NHGRI), and the multiple PI of the U19 grant on Integrative Analysis of Lung Cancer Etiology and Risk from NCI. She is also the contact PI of the T32 training grant on interdisciplinary training in statistical genetics and computational biology. She is the former contact PI of the Program Project (P01) on Statistical Informatics in Cancer Research from NCI, and the former contact PI of the Harvard Analysis Center (U19) of the Genome Sequencing Program of the National Human Genome Research Institute.
Dr. Lin was active in the early phase of the COVID-19 pandemic. She is a corresponding author of the JAMA and Nature papers on the analysis of the Wuhan COVID-19 data on transmission, public health intervention and epidemiological characteristics. She is the senior author of the 2021 Journal of the American Statistical Association Discussion paper on modeling COVID transmission dynamics in US. In Spring 2020, Dr. Lin served on the State of Massachusetts COVID-19 Task Force, and testified in the UK Parliament’s Committee of Science and Technology on COVID Responses.
Dr. Lin is an elected member of the National Academy of Medicine. She received the 2002 Mortimer Spiegelman Award from the American Public Health Association, the 2006 Committee of Presidents of Statistical Societies (COPSS) Presidents’ Award, the 2017 COPSS FN David Award, the 2008 Janet L. Norwood Award for Outstanding Achievement of a Woman in Statistics, the 2022 National Institute of Statistical Sciences Jerome Sacks Award for Outstanding Cross-Disciplinary Research, and the 2022 Marvin Zelen Leadership Award. She is an elected fellow of American Statistical Association (ASA), Institute of Mathematical Statistics, and International Statistical Institute.
Dr. Lin is the former Chair of the Committee of Presidents of Statistical Societies (COPSS) (2010-2012) and a former member of the Committee of Applied and Theoretical Statistics (CATS) of the National Academy of Science. She is the founding chair of the US Biostatistics Department Chair Group, and the founding co-chair of the Young Researcher Workshop of East-North American Region (ENAR) of International Biometric Society. She co-launched the Section of Statistical Genetics and Genomics of the American Statistical Association and served as a former section chair. She is the former Coordinating Editor of Biometrics and the founding co-editor of Statistics in Biosciences. She has served on a large number of committees of many statistical societies, and numerous NIH and NSF review panels.
[Full list of Google Scholar articles]
- Li, X., Quick, C., Zhou, H., Gaynor, S., Liu, Y., Chen, H., Selvaraj, M., Sun, R., Dey, R., …, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, TOPMed Lipids Working Group, Rotter, J., Natarajan, P., Peloso, G., Li, Z., Lin, X. (2023) Powerful, scalable and resource-efficient meta-analysis of rare variant associations in large whole genome sequencing studies. Nature Genetics 55(1), 154-164.
- Zhou, H., Arapoglou, T., Li, X., Li, Z., Zheng, X., Moore, J., Asok, A., Kumar, S., Blue, E.E., Buyske, S., Cox, N., Felsenfeld, A., Gerstein, M., Kenny, E., Li, B., Matise, T., Philippakis, A., Rehm, H.L., Sofia, H.J., Snyder, G., NHGRI Genome Sequencing Program Variant Functional Annotation Working Group, Weng, Z., Neale, B., Sunyaev, S.R., Lin, X. (2023). FAVOR: functional annotation of variants online resource and annotator for variation across the human genome. Nucleic Acids Research 51(D1), D1300-D1311
- Li, Z., Li, X., Zhou, H., Gaynor, S.M., Selvaraj, M., Arapoglou, T., Quick, C., Liu, Y., Chen, H., Sun, R., Dey, R., …, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, TOPMed Lipids Working Group, Rotter, J., Willer, C., Natarajan, P., Peloso, G., Lin, X. (2022). A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies. Nature Methods, 19(12), 1599-1611
- Li, Z., Liu, Y., Lin X. (2022). Simultaneous detection of signal regions using quadratic scan statistics with applications in whole genome association studies. Journal of American Statistical Association 117(538), 823-834
- Liu, Y., Li, Z., Lin X. (2022). A minimax optimal ridge-type set test for global hypothesis with applications in whole genome sequencing association studies. Journal of American Statistical Association 117(538), 897-908
- Liu, Z., Shen, J., Barfield, R., Schwartz, J., Baccarelli, A., Lin, X. (2022). Large-Scale Hypothesis Testing for Causal Mediation Effects with Applications in Genome-wide Epigenetic Studies. Journal of American Statistical Association 117(537), 67-81
- Dey, R., Zhou, W., Kiiskinen, T., Havulinna, A., Elliott, A., Karjalainen, J., Kurki, M., Qin, A., FinnGen, Lee, S., Palotie, A., Neale, B., Daly, M., Lin, X. (2022). Efficient and accurate frailty model approach for genome-wide survival association analysis in large-scale biobanks. Nature Communications, 13(1), 5437
- Hong, D., Dey, R., Lin, X.*, Cleary, B.*, Dobriban, E.* (2022). Group testing via hypergraph factorization applied to COVID-19. Nature Communication, https://doi.org/10.1038/s41467-022-29389-z. (* co-corresponding authors)
- Lin, X. (2022). Lessons Learned from the COVID-19 Pandemic: A Statistician’s Reflection. Statistical Science 37(2), 278-283
- Quick, C., Dey, R., and Lin, X. (2021). Regression Models for Understanding COVID-19 Epidemic Dynamics With Incomplete Data (with Discussions). Journal of the American Statistical Association 116(536), 1561-1577
- Ke, Z., Ma, Y., and Lin, X. (2021) Estimating the rank of a spiked covariance matrix by spectral quantile matching. Journal of the American Statistical Association, https://doi.org/10.1080/01621459.2021.1933497.
- Wang, X., Lin, X., Johnson, B., Christiani, D. C. (2021) Smoking history as a potential predictor of immune checkpoint inhibitor (ICI) efficacy in metastatic non-small cell lung cancer (NSCLC), Journal of National Cancer Institute, https://doi.org/10.1093/jnci/djab116.
- Dayan, I., Roth. H., Zhong, Z., Harouni, A., … Lin, X., Wen, Y., Gilbert, F. J., Flores, M. G., Li, Q. (2021). Federated Learning used for predicting outcomes in SARS-COV-2 patients. Nature Medicine, 27,1735–1743.
- Li, X., Li, Z., Zhou, H, Gaynor, S, …, Rotter, J., Willer, C. J., Peloso, G. M., Natarajan, P., Lin, X (2020). Dynamic incorporation of multiple in-silico functional annotations empowers rare variant association analysis of large whole genome sequencing studies at scale. Nature Genetics, 52(9), pp.969-983.
- Pan, A., Liu, L., Wang, C., Guo, H., Hao, X., Wang Q., Huang, J., He, N., Yu, Ho, Lin, X.*, Wei, S.*, Wu, T.* (2020) Association of Public Health Interventions With the Epidemiology of the COVID-19 Outbreak in Wuhan, China. Journal of the American Medical Association 323(19):1915-1923 (* Co-corresponding authors)
- Segal, E., Zhang, F., Lin, X., King, G., Shalem, O, Shilo, S, Allen, W., Grad, Y., Greene, C., Alquaddoomi, F., Anders, S., Balicer, R., Bauman, T., Bonilla, X, Booman, G., Chan, A., Cohen, O., Coletti, S., Davidson, N., Dor, Y., Drew D., Elemento, O., Evans, G., Ewels, P., Gale, J., Gavrieli, A., Geiger, G., Hajirasouliha, I., Jerala, R., Kahles, A., Kallioniemi, O, Keshet, A., Kocarev, L., Landua, G., Meir, T., Muller, A., Nguyen, A., Oresic, M., Ovchinnikova, S., Peterson, D. , Prodanova, J., Rajagopal, J., Rtsch, G., Rossman, H., Rung, J., Sboner, A., Sigaras, A., Spector, T., Steinherz, R., Stevens, I., Vilo, J., Wilmes, P. (2020) Building an International Consortium for Tracking Coronavirus Health Status Nature Medicine 26(8), 1161-1165
- Sun, R. and Lin, X. (2020). Genetic Variant Set-Based Tests Using the Generalized Berk-Jones Statistic With Application to a Genome-Wide Association Study of Breast Cancer Journal of the American Statistical Association 115 (531), 1079-1091
- Hao, X., Cheng, S., Wu, D., Wu, T.*, Lin, X.* and Wang, C.* (2020) Reconstruction of the full transmission dynamics of COVID-19 in Wuhan, Nature, 584(7821), pp.420-424. (* co-corresponding authors)
- He, X., and Lin, X. (2020). Challenges and Opportunities in Statistics and Data Science: Ten Research Areas (with Discussions). Harvard Data Science Review. 2(3), https://doi.org/10.1162/99608f92.95388fcb
- Taliun, D., Harris, D.N., Kessler, M.D., Carlson, J., Szpiech, Z.A., Torres, R., Taliun, S.A.G., Corvelo, A., Gogarten, S.M., Kang, H.M. and Pitsillides, A.N., , Lin, X., , …, Abecasis, G. (2020). Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature, 590(7845), pp.290-299.