Georg Hahn

Georg Hahn
Research Associate, Harvard T.H. Chan School of Public Health

Genome‐wide association analysis of COVID‐19 mortality risk

SARS-CoV-2 mortality has been extensively studied in relation to host susceptibility. How sequence variations in the SARS-CoV-2 genome affect pathogenicity is poorly understood. Starting in October 2020, using the methodology of genome-wide association studies (GWAS), we looked at the association between whole-genome sequencing (WGS) data of the virus and COVID-19 mortality as a potential method of early identification of highly pathogenic strains to target for containment. While continuously updating our analysis, in December 2020, we analyzed 7,548 single stranded SARS-CoV-2 genomes of COVID-19 patients in the GISAID database and associated variants with mortality using a logistic regression. In total, evaluating 29,891 sequenced loci of the viral genome for association with patient/host mortality, two loci, at 12,053bp and 25,088bp, achieved genome-wide significance (p-values of 4.09e-09 and 4.41e-23, respectively), though only 25,088bp remained significant in follow-up analyses. The locus at 25,088bp is located in the P.1 strain, which later (April 2021) became one of the distinguishing loci (precisely, substitution V1176F) of the Brazilian strain as defined by the Centers for Disease Control (UCSC Genome Browser, 2021). Specifically, the mutations at 25,088bp occur in the S2 subunit of the SARS-CoV-2 spike protein, which plays a key role in viral entry of target host cells. Since the mutations alter amino acid coding sequences, they potentially imposing structural changes that could enhance viral infectivity and symptom severity.