Fitting Spatial Surfaces to Noisy Observations

Home

GIS Research in Longwood Medical Area

GIS at Harvard University

FAQs

Scripts

Spatial Statistics

Links

Contributed by Chris Paciorek (paciorek AT hsph.harvard.edu), Ph.D, Department of Biostatistics, Harvard School of Public Health

There are many options for fitting spatial surfaces to noisy observations. Here three readily usable options are described, but see the software links for additional approaches, particularly ones implemented in R. These three options are often appropriate for continuous data. The methods are not appropriate for data that are counts at each location (although they may be reasonable when all the counts are fairly large and none near zero) or 0/1 binary indicators. The first option is the SemiPar package for the statistical software S-plus. The second option is kriging within the ArcGIS software. The third option is the gam function in the mgcv library for the statistical software R. My goal here is to describe how to use these methods and outline situations in which one or the other may be more appropriate. For applications for which these options do not seem appropriate, one will probably want to consult with a statistician familiar with spatial modelling techniques.

SemiPar with S-plus: SemiPar is open source code written in the S statistical language and executed in the S-plus software package. S-plus is a commercial statistical computing environment that is available for both Windows and Unix systems. While the SemiPar code is free, S-plus is not. However, the authors of SemiPar are working on a version for the statistical language R, which is an open-source version of the S language that underlies S-plus. If necessary, it should be possible to convert the S-plus code to R fairly easily. SemiPar fits a spatial surface within the framework of linear mixed models, using a thin-plate spline basis to represent the surface. The model can include additional covariates, such as age or sex, as either linear or smooth (nonparametric) components. For data with a strong spatial directionality, the SemiPar software may not be ideal, because it assumes that the variability of the spatial surface is similar in all directions. Also for data for which the variability of the spatial surface is very different in different parts of the space (called nonstationarity), such as the topography of Colorado, with flat plains in the east and mountains in the west, the fit will not be ideal, because the model assumes similar variability throughout the space. Maps can be created within S-plus using SemiPar or using other S functions. S-plus graphics are customizable, but the extensive layers available in GIS software are not readily available. The SemiPar models are run and results plotted using a command line interface.

Detailed guide to using SemiPar

Kriging within ArcGIS: ArcGIS is commercial GIS software, available primarily (exclusively?) for Windows systems. Spatial surface fitting can be performed by the statistical method of kriging using either the Spatial Analyst or Geostatistical Analyst extensions. The underlying code by which the kriging is done is not accessible, and it can take some effort to understand exactly what is being done based on the documentation. Kriging is a standard statistical method, but the implementation in ArcGIS is done in a somewhat ad hoc manner, in part for computational efficiency. ArcGIS cannot include additional covariates; it only fits the spatial surface without accounting for other predictors. ArcGIS can account for spatial directionality. However, like SemiPar, it is not ideal for nonstationary data for which the variability of the spatial surface is very different in different parts of the space. Sophisticated maps can be created within the GIS software, although one is limited to the mapping options provided by the creators of the software. Models are fit and maps created using a graphical user interface.

Detailed guide to using ArcGIS for Kriging with ArcGIS

mgcv in R: Finally, the mgcv library in R (written by Simon Wood) has a gam() (generalized additive model) function that allows for thin plate spline-based surface fits within a gam model in which other covariates (both as linear and smooth terms) can be readily included. Be careful: the gam() function in S-plus does not implement the same method and cannot be used for flexible surface fitting. Here is example code for fitting such a model. Note that the R gam() function allows for both linear covariates (e.g., covar1 in the example code) and smoothing terms (e.g., covar2, fit with a cubic regression spline), in addition to the spatial effect (e.g., x and y, fit with a thin-plate regression spline). The degree of smoothing is estimated within the model using the generalized cross-validation (GCV) criteria.

Recommendations: In general, I would recommend using the SemiPar code or the R mgcv code rather than kriging within ArcGIS. The R mgcv library has the nice feature of providing accessible estimates of uncertainty for both the spatial surface and the other covariates. Neither the SemiPar nor mgcv statistical models rely heavily on ad hoc fitting methods used in the ArcGIS kriging routines, and they can account for important additional factors that may affect the response. The degree of smoothness in the surface estimated by SemiPar and mgcv is controlled by a single parameter, rather than a number of parameters in the kriging approach. While having multiple parameters might seem to allow for a more flexible fit to the data, with the kind of smoothing done here, there is effectively only one parameter that controls the degree of smoothing, so the kriging approach introduces additional complexity with little change in the ability to fit a surface. The exception to this comment is the ability of the kriging methodology to account for directionality. Also note that if one wants to use the GIS software to create maps, the results from SemiPar/S-plus and mgcv/R can be imported into the GIS software.

References:
For more information on the statistical methods underlying the SemiPar software, a starting point is:
Kammann, E.E., and M.P. Wand. 2003. Geoadditive models. Applied Statistics 52:1-18.

Information on the methods of Simon Wood implemented in the R library mgcv can be found in:
Wood, S.N. 2000. Modelling and smoothing parameter estimation with multiple quadratic penalties. Journal of the Royal Statistical Society, Series B 62:413-428.
Wood, S.N. 2003. Thin plate regression splines. Journal of the Royal Statistical Society, Series B 65:95-114.

Information on exactly how kriging is done in ArcGIS is a bit difficult to come by. The online documentation has limited technical detail. More extensive information is available in:
Johnston, K. et al. 2001. Using ArcGIS geostatistical analyst. Redlands, CA: Environmental Systems Research Institute.
McCoy, J. et al. 2001. Using ArcGIS spatial analyst. Redlands, CA: Environmental Systems Research Institute.
Both are available in Harvard’s Pusey Library Map Collection, for purchase from ESRI and can be downloaded as pdf files from the Harvard site license downloads page (click on the "books and samples" file, but note it's a huge file).
The ESRI website has some technical reports, most of which are somewhat tangential to the methods implemented in ArcGIS, but a couple of which provide further details on the fitting methods.