The Public Health Disparities
Geocoding Project Monograph

Geocoding and Monitoring US Socioeconomic Inequalities in Health:
An introduction to using area-based socioeconomic measures
WHY?
READ MORE
HOW TO
TRY IT OUT!
TOOLS
Executive Summary
Introduction
Publications
Geocoding
Generating ABSMs
Analytic Methods
Multi-level Modeling
Visual Display
Case Example
U.S. Census Tract Poverty Data

Glossary

STEP BY STEP COMPARISON
A step by step comparison of each task of the Case Example, the relevant section of Analytic Methods, and sample SAS code

(click here for a pdf version of all 8 steps)

Step by Step 1
Step by Step 2
Step by Step 3
Step by Step 4
Step by Step 5
Step by Step 6
Step by Step 7
Step by Step 8
Step 2:
Aggregate the denominator data.
 
CASE EXAMPLE
ANALYTIC METHODS
SAS PROGRAMMING
click here to download SAS program

The file rawdenom.csv is a comma-delimited file containing the estimated population count in 31 age categories for the 189 census tracts in Suffolk County, from the 1990 U.S. Census. Each census tract is represented by one line in the data file, with the 31 age categories arrayed horizontally.

a. Aggregate the population counts into the five broad age categories listed above.

b. Transpose the structure of the data, so that is one record for each age stratum within a census tract, with a corresponding categorical age variable and population count. You should end up with 5 records for each census tract, with each record represented by one line of your output dataset.

c. Multiply the population count by 3, to yield a person-time denominator for three years worth of death data.

Denominator data at the census tract level typically come from the decennial census, which gives population counts in 31 age categories (<1, 1-2, 3-4, 5, 6, 7-9, 10-11, 12-13, 14, 15, 16, 17, 18, 19, 20, 21, 22-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, 60-61, 62-64, 65-69, 70-74, 75-79, 80-84, 85+). For the purposes of age standardization, these age categories need to be re-aggregated to match the age categories used for categorizing case data (numerators, above) and the age categories from the standard million reference population. Additionally, when using case data from multiple years, in order to calculate an average annual incidence rate, one needs to use a person-time denominator (population count multiplied by number of years of case data). PROC IMPORT OUT= rawdenom
DATAFILE= "G:\monograph\example\rawdenom.csv"
DBMS=CSV REPLACE;
GETNAMES=YES;
DATAROW=2;
RUN;

DATA Step2a ;
SET rawdenom ;

AGECAT1= SUM(OF P0130001-P0130009) ;
AGECAT2= SUM(OF P0130010-P0130017) ;
AGECAT3= SUM(OF P0130018-P0130021) ;
AGECAT4= SUM(OF P0130022-P0130026) ;
AGECAT5= SUM(OF P0130027-P0130031) ;


LENGTH AGECAT 3 ;

ARRAY AGES [5] AGECAT1-AGECAT5 ;

DO I=1 TO 5 ;
AGECAT=I ;
DENOM=3*AGES[I] ;
OUTPUT ;
END ;

DROP I AGECAT1-AGECAT5 P0130001-P0130031 ;
RUN ;

Home Page
previous step            next step
back to top
Who We Are
Acknowledgements
Contact Us
This work was funded by the National Institutes of Health (1RO1HD36865-01) via the National Institute of Child Health & Human Development (NICHD) and the Office of Behavioral & Social Science Research (OBSSR).
Copyright © 2004 by the President and Fellows of Harvard College - The Public Health Disparities Geocoding Project.