SKAT (SNP Set/Sequence Association Test)

SKAT is a R package for performing

(1) Association tests between a set of common and rare SNPs and continuous and dichotomous (case-control) phenotypes using kernel machine methods for data from GWAS and genome-wide sequencing association studies

(2) Sample size and power calculatons for sequencing association studies.


MetaSKAT (Meta-analysis for multiple markers)

MetaSKAT is a R package for multiple marker meta-analysis across studies. It can carry out meta-analysis of SKAT, SKAT-O and burden tests with individual level genotype data or gene level summary statistics.


  • Lee, S., Teslovich, T.M., Boehnke, M. and Lin, X. (2013) General framework for meta-analysis of rare variants in sequencing association studies, American Journal of Human Genetics, in press.

CEPSKAT (Continuous Extreme Phenotype SKAT)

CEPSKAT extends the SKAT framework to the setting of continuous extreme phenotype samples. You can download the R package for CEPSKAT here. For Windows, download the compiled binary version instead. Consult the help files in the package for instruction and examples of usage.


coxKM (cox Kernel Machine)

coxKM (cox Kernel Machine) is an R package for conducting SNP-set association tests for right-censored survival outcomes based on kernel machine cox regression framework. coxKM is meant for common genetic variants only. coxKM tests for association between a SNP-set (made up of common variants) and a right-censored survival outcome. Software download , manual download .


  • Lin X, Cai T, Wu M, Zhou Q, Liu G, Christiani D and Lin X. 2011. Survival Kernel Machine SNP-set Analysis for Genome-wide AssociationStudies. Genetic Epidemiology 35:620-31. doi: 10.1002/gepi.20610
  • Cai T, Tonini G and Lin X. 2011. Kernel machine approach to testing the significance of multiple genetic markers for risk prediction. Biometrics, 67:975-86. doi:10.1111/j.1541-0420.2010.01544.x

gSKAT (family based association test)

gskat is a R package implements a family based association test via GEE Kernel Machine (KM) score test. It has functions to perform both burden test and SKAT test with family members as well as unrelated individuals. The package allows for both continuous and discrete traits in the association test.Software download

User groups: Feel free to join in the group to ask / discuss / comment about the package on the forum.


  • Wang X, Lee S, Zhu X, Redline S, and Lin X. (2013) GEE-Based SNP Set Association Test for Continuous and Discrete Traits in Family-Based Association Studies. Genet Epidemiol. 37:778-86.

GMMAT (Generalized linear Mixed Model Association Test)

GMMAT is an R package for performing genetic association tests for outcomes with distribution in the exponential family (e.g. binary outcomes) based on the generalized linear mixed model. It can be used to analyze genetic data from individuals with population structure and relatedness. GMMAT fits a generalized linear mixed model under the null hypothesis of no genetic association, and then performs a score test for each individual genetic variant.


  • Breslow NE and Clayton DG. 1993. Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association 88: 9-25.
  • Chen H, Wang C, Conomos MP, Stilp AM, Li Z, Sofer T, Szpiro AA, Chen W, Brehm JM, Celedon JC, Redline SS, Papanicolaou GJ, Thornton TA, Laurie CC, Rice K and Lin X. Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies Using Logistic Mixed Models. Submitted.

SMAT (Scaled Multiple-phenotype Association Test)

SMAT is an R package for performing the Scaled Multiple-phenotype Association Test in cohort or case-control designs to assess common effect of a single nucleotide polymorphism (SNP) on multiple (positively correlated) continuous outcomes measuring the same underlying trait.

The current version of the R package is 0.98. Please download the source .tar.gz file or the .zip file for installation. Please download the manual PDF here. Some example files are also available for download.


  • Schifano, E.D., Li, L., Christiani, D.C., and Lin, X. (2012) Genome-wide Association Analysis for Multiple Continuous Secondary Phenotypes. (in revision)
  • Roy, J., Lin, X., and Ryan, L. (2003). Scaled Marginal Models For Multiple Continuous Outcomes. Biostatistics, 4, 371-384.

Sparse PCA

R functions for sparse PCA and some examples.


  • Lee, S., Epstein, M.P., Duncan, R. and Lin, X. (2012) Sparse principal component analysis for identifying ancestry-informative markers in genome-wide association studies. Genetic Epidemiology , 36.4, 293-302.

Pathway Analysis

sLDA Pathway Test

An R function for testing for differential expression of a gene set/pathway based on the sparse linear discriminant analysis approach.

Logistic Kernel Machine

A SAS Macro for estimating and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. A SAS Macro for doing semiparametric regression of multi-dimensional genetic pathway data, using least squares kernel machines and linear mixed models.


  • Wu, M.,C., Zhang, L., Wang, Z., Christiani, D. C., Lin, Sparse linear discriminant analysis for simultaneous gene set/pathway significance test and gene selection. , Bioinformatics, , 25,1145-1151.
  • Liu, D., Ghosh, D. and Lin, X. (2008) Estimation and Testing for the Effect of a Genetic Pathway on a Disease Outcome Using Logistic Kernel Machine Regression via Logistic Mixed Models. BMC Bioinformatics, 9, 292.
  • Liu, D., Lin, X. and Ghosh, D. (2007) Semiparametric Regression of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines and Linear Mixed Models. Biometrics, 63, 1079-1088.

Nonparametric Regression


A SAS Macro to fit smoothing splines mixed models, with documentation.

SAS Macro Spline_Mixed

A SAS Macro for calculating a cubic smoothing spline using PROC MIXED.


A SAS Macro to fit generalized additive mixed models using smoothing splines.


  • Zhang D., Lin X., Raz J., and Sowers M. (1998). Semiparametric stochastic mixed models for longitudinal data, Journal of the American Statistical Association, 93, 710-719.
  • Lin X. and Zhang D. (1999). Inference in generalized additive mixed models using smoothing splines, Journal of the Royal Statistical Society, Series B, 61, 381-400.
  • Zhang D., Lin X. and Sowers M. (2000). Periodic semiparametric regression for longitudinal hormone data from multiple menstrual cycles. Biometrics, , 56, 31-39.
Copyright © Xihong Lin, 2010-2012