The Public Health Disparities
Geocoding Project Monograph

Geocoding and Monitoring US Socioeconomic Inequalities in Health:
An introduction to using area-based socioeconomic measures
WHY?
READ MORE
HOW TO
TRY IT OUT!
TOOLS
Executive Summary
Introduction
Publications
Geocoding
Generating ABSMs
Analytic Methods
Multi-level Modeling
Visual Display
Case Example
U.S. Census Tract Poverty Data

Glossary

STEP BY STEP COMPARISON
A step by step comparison of each task of the Case Example, the relevant section of Analytic Methods, and sample SAS code

(click here for a pdf version of all 8 steps)

Step by Step 1
Step by Step 2
Step by Step 3
Step by Step 4
Step by Step 5
Step by Step 6
Step by Step 7
Step by Step 8
Step 1:
Aggregate the numerator data.
CASE EXAMPLE
ANALYTIC METHODS
SAS PROGRAMMING
click here to download SAS program
The file rawcase.csv is a comma delimited file containing all deaths occurring in Suffolk County, Massachusetts, between 1989 and 1992. Each person who died is represented by one line in the data file. The variable “AGE” gives the age at death. The variable “AREAKEY” is the geocode to the census tract level.

Data from public health databases are typically formatted such that each record represents one person (or case report). Once these data have been geocoded, they need to be aggregated before linking to denominator and ABSM data. Before aggregating, however, one should exclude all records that are not geocoded, do not meet the case definition, or are missing data on the important covariates (e.g. age, in the case of simple age-standardized analyses; age, sex, and race/ethnicity in the case of more complex stratified analyses).

One can think of the basic unit of aggregation as a cell, defined by age and other covariates, within an area/geocode. Once aggregated, this cell within an area can be linked to a relevant population denominator. The cell contains a count of all cases within that area that meet the specified age and other covariate criteria. Since our goal is eventually to create rates, we call this count of cases the “numerator.”

PROC IMPORT OUT= rawcase
DATAFILE= "G:\monograph\example\rawcase.csv"
DBMS=CSV REPLACE;
GETNAMES=YES;
DATAROW=2;
RUN;

DATA Step1a ;
SET rawcase ;

IF 0<=AGE<15 THEN AGECAT=1 ;
IF 15<=AGE<25 THEN AGECAT=2 ;
IF 25<=AGE<45 THEN AGECAT=3 ;
IF 45<=AGE<65 THEN AGECAT=4 ;
IF AGE>=65 THEN AGECAT=5 ;
RUN ;

PROC FREQ DATA=Step1a NOPRINT ;
TABLES AREAKEY*AGECAT /OUT=Step1b ;
RUN ;

Home Page
next step
back to top
Who We Are
Acknowledgements
Contact Us
This work was funded by the National Institutes of Health (1RO1HD36865-01) via the National Institute of Child Health & Human Development (NICHD) and the Office of Behavioral & Social Science Research (OBSSR).
Copyright © 2004 by the President and Fellows of Harvard College - The Public Health Disparities Geocoding Project.