The Public Health Disparities
Geocoding Project Monograph

Geocoding and Monitoring US Socioeconomic Inequalities in Health:
An introduction to using area-based socioeconomic measures
Executive Summary
Generating ABSMs
Analytic Methods
Multi-level Modeling
Visual Display
Case Example
U.S. Census Tract Poverty Data

The Public Health Disparities Geocoding Project Monograph
Types of SAS Merging

Although there are many ways in which data can be combined using SAS commands, most frequently one of the following five methods are used: (1) concatenation, (2) interleaving, (3) one-to-one merge, (4) matched merge, (5) updating.1

Concatenation is when SAS data sets are stacked on top of each other. This is most commonly used when additional observations are added to an existing dataset. Typically the concatenated data sets have the same variables, although they may have some or none of the variables in common. Concatenation is done using the SET statement.

Interleaving is similar to concatenation, but is used for combining datasets when it is optimal to have observations with similar values for key variables appear consecutively in the output dataset. Interleaving uses the same commands as concatenation, except a BY statement is used to indicate the key variable(s) that observations with similar values should appear next to each other. Prior to merging, both data sets should be sorted by the key variable(s) using the SORT statement.

One-to-one merge is done to place datasets with an equal number of observations side by side when there is no variable that can be used to match observations. This method should be avoided, and is appropriate only if observations are in the same order in both data sets and there are no common variables that can be used to merge the datasets using a matched merge.

Matched-merge is similar to the one-to-one merge, but data from observations in each dataset are combined based on a common identifying variable. This is most commonly used when new variables for the same observations are to be added to an existing dataset.
If for some observations, the identifying variable is present in only one dataset, missing values will be assigned to the variables of the dataset where there is no identifier. Before merging, variables must be sorted by the common identifying variable.

Updating is similar to a matched-merge, but non-missing data from the second dataset overwrite the values for variables that are common to both datasets.

When merging the numerators and denominators by AGECAT and AREAKEY in step 3 of the case example, it is the matched-merge type of combing data that is used. The merging of this data with ABSM data in step 4, as well as the merge of this data with the year 2000 standard million 5 category age distribution in step 5 is also an example of a matched-merge.

1. DeIorio, Frank. (1991). SAS applications programming: a gentle introduction. Duxbury Press, Pacific Grove, CA

Home Page
back to top
Who We Are
Contact Us
This work was funded by the National Institutes of Health (1RO1HD36865-01) via the National Institute of Child Health & Human Development (NICHD) and the Office of Behavioral & Social Science Research (OBSSR).
Copyright © 2004 by the President and Fellows of Harvard College - The Public Health Disparities Geocoding Project.