Software and Resources
The Computing Core includes the following 3 components:
- make accessible the analyses and research of the Statistical Informatics for Cancer Research group;
- create and distribute novel software that will aid statistical analysis of cancer monitoring; and
- provide a comprehensive data management structure for the research group, and make public data sets more easily accessible to other researchers.
Several products have been produced through computing work. For R programmers interested in building packages, there is a collection of original tutorial materials along with helpful references for additional information and R programming topics. A data page is also available with information regarding the data sets used in the Statistical Informatics for Cancer Research group.
Part of open research is open software. Below are packages and R functions created as part of the PO1 Grant.
- glmmGS: Computationally-efficient fitting of GLMMs via PQL Michele Morara, Louise Ryan, Subharup Guha, and Christopher Paciorek A package for fitting GLMMs (with an emphasis on models with spatial structure) using Penalized Quasi-Likelihood (PQL). Spatial dependence can be specified through the covariance or through the precision, and can be represented in sparse matrix format. The package uses computationally-efficient Gauss-Seidel optimization to update blocks of parameters, exploiting the blocked matrix structure present in some GLMMs. In addition, the package can exploit sparse structure in both the random effects design matrix and the spatial precision matrix of the random effects, if present. The package allows for spatially-correlated random intercepts and spatially-correlated random slopes.
- SKAT: SNP-set Kernel Assocation Test Seunggeun Lee, Larisa Miropolsky, and Michael Wu SKAT is an R package for performing association tests between a set of common and rare SNPs and continuous/dichotomous (case-control) phenotypes using kernel machine methods for GWAS and sequencing data.
- sLDA: Sparse Linear Discriminant Analysis Michael C. Wu, Lingsong Zhang, Zhaoxi Wang, David C. Christiani, and Xihong Lin An R function for testing for differential expression of a gene set/pathway based on the sparse linear discriminant analysis approach.
- Weighted Cumulative Geographic Residual Test Andrea Cook, view publication. R code in functional form that was used to run area-level weighted cumulative geographic residual tests.
- Cumulative geographic residual for repeated measures Andrea Cook, view publication. R code in functional form that was used to run a repeated measures cumulative geographic residual test.
- Generalized Linear Mixed Model with Semiparametric Random Effects Subharup Guha, view publication. R code to fit generalized linear mixed model with semiparametric random effects.
- voronoi: Methods and applications related to Voronoi tessellations Christopher D. Barr, Travis A. Gerke, and David M. Diez Tools for simulating and summarizing point patterns through Voronoi tesselations and their related estimators. Currently available on CRAN.
areaglmm: Area-level spatial generalized linear mixed models
This package contains functions for fitting area-level spatial and spatio-temporal models in R, using the spatial correlation structure described in papers submitted for publication as part of a dissertation. The package's regression function operates on a new class of areal data object, which can be created using functions in the package. Plotting functions are also provided for mapping the area level relative risks and continuous underlying spatial residuals.
- BEAU: Bayesian effect Estimation accounting for Adjustment Uncertainty Chi Wang, Giovanni Parmigiani, and Francesca Dominici A Bayesian approach to estimate the effect on the outcome associated with an exposure of interest while accounting for the uncertainty in the confounding adjustment.
GQTE: generalized quantile treatment effect
Fully Bayesian inferential estimation of the generalized quantile treatment effect (GQTE), which is defined as the difference between a (known) function of the quantiles under the two treatment conditions.
Package release in fall 2011.
Additional software released as part of the PO1 grant.
- Logistic Kernel Machine (SAS Macro) View publication A SAS Macro for estimating and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models.
- Least Square Kernel Machine (SAS Macro) View publication A SAS Macro for doing semiparametric regression of multi-dimensional genetic pathway data, using least squares kernel machines and linear mixed models.
- Analyzing Array-Based CGH Data Using Bayesian HMM (Matlab demo) View publication This example shows how to use a Bayesian hidden Markov model (HMM) technique to identify copy number alteration in array-based comparative genomic hybridization (CGH) data... read more at mathworks.com.
We support open research in our own group and in the larger statistics community. Below are slides and helpful links to become familiar with writing R packages. Original materials are licensed via Creative Commons (CC) below. PDFs with a CC next to them are released under the Creative Commons Attribution-NonCommercial-NoDerivs license. Source documents listed below with a CC next to them are released under the Creative Commons Attribution-NonCommercial-ShareAlike license.
- R packages talk (CC); source (CC)
- Videos for building R packages: Part 1, Part 2, Part 3 (YouTube)
- Coding advice for new R users (CC)
- Creating R Packages: A Tutorial (CRAN)
It is common for advanced R users to also utilize the software's capabilities to call code written in other languages. Below are useful references.
- An Introduction to the .C Interface to R (UCLA)
- Software for Data Analysis (CRAN, information for this book)
Other useful resources for learning the basics of R, LaTeX, and Sweave can be found below.