(September 2015) The Eagle algorithm (Loh et al. bioRxiv) estimates haplotype phase given diploid genotype data from large population cohorts (tens to hundreds of thousands of samples) typed on genotyping arrays. In contrast to conventional HMM-based methods for statistical phasing, Eagle applies long-range phasing to rapidly phase segments of genome identical-by-descent (IBD) among closely or distantly related individuals within the sample; in very large samples, such IBD is pervasive. To phase segments lacking IBD, Eagle then runs two iterations of fast approximate Viterbi decoding using a simple diploid analog of the Li-Stephens HMM. The Eagle software can be downloaded here.
(July 2015) The haploSNP algorithm (Bhatia et al. biorxiv) constructs a set of haplotype polymorphisms (haploSNPs) from phased genotype data. haploSNPs are haplotypes of adjacent SNPs excluding a subset of masked sites that arise from skipped mismatches. Mismatches are skipped only if they can be potentially explained as mutations on a shared background. This is tested using a 4-gamete test between the haploSNP being extended and the mismatch SNP. If all 4 possible allelic combinations are observed, the mismatch cannot be explained as a mutation on a shared background, and the haploSNP is terminated.
Individuals are considered to carry 0,1, or 2 copies of the haploSNP if none, one or both of their chromosomes matches the haplotype at all unmasked sites. As haploSNPs are biallelic, they can be used in downstream analyses such as heritability estimation and association.
The haploSNP software can be downloaded here.
Efficient PCGC Regression
(June 2015) PCGC regression (Golan et al. 2014 PNAS) is designed to avoid biases in REML estimation of heritability in the context of ascertained case-control studies. We have released an efficient implementation of the PCGC regression method. Subject to the restriction that all GRMs be computed over identical lists of individuals (all *.grm.id files must be identical), this implementation eliminates in-memory storage of N x N matrices by accumulating dot products among regressors on-the-fly (i.e., streaming the GRM inputs), speeds up jackknife computations (storing partition results on the fly), and eliminates storage of “cleaned” GRMs (i.e., with PCs projected out) by projecting PCs on-the-fly. The Efficient PCGC Regression software is used in Loh et al. biorxiv and Bhatia et al. biorxiv, and can be downloaded here.
BOLT-LMM and BOLT-REML
(April 2015) The BOLT-LMM algorithm (Loh et al. 2015) rapidly computes statistics for association between phenotype and genotypes using a linear mixed model (LMM). The BOLT-REML algorithm partitions SNP-heritability and estimates genetic correlations using a Monte Carlo algorithm for fast multi-component, multi-trait modeling (Loh et al. bioRxiv). By default, BOLT-LMM association analysis assumes a Bayesian mixture-of-normals prior for the random effect attributed to SNPs other than the one being tested. This model generalizes the standard “infinitesimal” mixed model used by existing mixed model association methods, providing an opportunity for increased power to detect associations while controlling false positives. Both algorithms are implemented in the BOLT-LMM v2.2 software package; see link for update log.
(March 2015) The LDpred software can be downloaded here. LDpred (Vilhjalmsson et al. biorxiv) is a method for computing polygenic risk scores from summary association statistics while accounting for LD between markers. The method infers the posterior mean causal effect size of each marker using a non-infinitesimal prior distribution on effect sizes and LD information from an external reference panel.
(February 2015) The ldsc software can be downloaded here. LD Score regression (Bulik-Sullivan et al. 2015) is a method for distinguishing confounding from polygenicity in genome-wide association studies. Stratified LD Score regression (Finucane et al. biorxiv; functional annotations here) is a method for partitioning heritability by functional category using GWAS summary statistics. Cross-trait LD Score regression (Bulik-Sullivan et al. biorxiv) is a method for estimating genetic correlations using GWAS summary statistics.
(December 2014): EIGENSOFT version 6.0.1 is now available for download. The EIGENSOFT package combines functionality from our population genetics methods (Patterson et al. 2006) and our EIGENSTRAT stratification correction method (Price et al. 2006). The EIGENSTRAT method uses principal components analysis to explicitly model ancestry differences between cases and controls along continuous axes of variation; the resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. The EIGENSOFT package has a built-in plotting script and supports multiple file formats and quantitative phenotypes.
Source code, documentation and executables for using EIGENSOFT 6.0.1 on a Linux platform can be downloaded here. New features of EIGENSOFT 6.0.1 include fastmode option which implements a very fast pca approximation (Galinsky et al. biorxiv), support for multi-threading, a bug fix for ldregress option, and another minor bug fix to version 6.0beta. Our previous release, version 5.0.2, can be downloaded here.
The EIGENSOFT FAQ (Frequently Asked Questions) is available here. For further questions about the EIGENSOFT software, please write to Samuela Pollack (firstname.lastname@example.org).
SOFTWARE REGISTRATION: we encourage EIGENSOFT users to register here. Registration is voluntary, but will allow us to send information about software updates.
(October 2014): LTSOFT version 3.0 can be downloaded here. Changes to version 3.0 include the addition of LTMLM, a new piece of software implementing a multivariate liability threshold mixed linear model association statistics for additional increase in power in settings of case control diseases (Hayeck et al. 2014 biorxiv). LTSOFT is a software suite designed to more powerfully leverage clinical-covariates such as age, bmi, smoking status, and gender as well as genetic-covariates such as known associated variants when conducting case-control association studies. Including these covariates in standard regression models is not only suboptimal, but can in many instances reduce power. LTSOFT employs a liability threshold model approach that takes advantage of known epidemiological results to better model the covariates’ relationship to the phenotype of interest (Zaitlen et al. 2012 PLoS Genet and Zaitlen et al. 2012 Bioinformatics, Hayeck et al. 2014 biorxiv).
(May 2014): SNPweights version 2.1 can be downloaded here. SNPweights is a software package for inferring genome-wide genetic ancestry using SNP weights precomputed from large external reference panels (Chen et al. 2013 Bioinformatics). Changes to version 2.0 include new SNP weights for Native American reference samples, a new format for SNP weights files, and new software for users to derive SNP weights using their own reference samples. Version 2.1 incorporates a bug fix in the inferanc program, which now works with all snpwt files. SNP weights for European and West African ancestral populations can be downloaded here. SNP weights for European, West African and East Asian ancestral populations can be downloaded here. SNP weights for European, West African, East Asian and Native American ancestral populations can be downloaded here. SNP weights for NW, SE and AJ ancestral populations of European Americans can be downloaded here.
(March 2014): Functional annotations of SNPs and regions from our functional heritability paper “Regulatory variants explain much more heritability than coding variants across 11 common diseases” (Gusev et al. 2014) can be downloaded here.
(July 2013): ImpG-Summary version 1.0 can be downloaded here. ImpG-Summary is a software package for Gaussian imputation from summary association statistics, as described in our paper “Fast and accurate imputation from summary association statistics” (Pasaniuc et al. 2014).
(July 2012): MIXSCORE version 1.3 can be downloaded here. MIXSCORE is a method for combining SNP association and admixture association statistics to increase power in GWAS in admixed populations. For details, see the MIXSCORE paper (Pasaniuc et al. 2011 PLoS Genet, “Enhanced statistical tests for GWAS in admixed populations: assessment using African Americans from CARe and a breast cancer consortium”).
(April 2012): TreeSelect version 1.1 can be downloaded here. TreeSelect is a software package for inferring natural selection from unusual population differentiation between closely related populations. For details, see our Africa selection paper (Bhatia et al. 2011 AJHG, “Genome-wide comparison of African-ancestry populations from CARe and other cohorts reveals signals of natural selection.”)
(March 2011): HAPMIX version 1.2 can be downloaded here. Improvements to version 1.2 include an explicit check for discordance between admixed and reference population allele frequencies, and a script to interpolate estimates of local ancestry to a superset of SNPs. HAPMIX is an application for accurately inferring chromosomal segments of distinct continental ancestry in admixed populations, using dense genetic data.For details, see the HAPMIX paper (Price et al. 2009).
GENE EXPRESSION HERITABILITY
(January 2011): Source code and gene-by-gene results from our gene expression heritability paper “Single-tissue and cross-tissue heritability of gene expression via identity-by-descent in related or unrelated individuals” (Price et al. 2011) can be downloaded here.