The Public Health Disparities
Geocoding Project Monograph

Geocoding and Monitoring US Socioeconomic Inequalities in Health:
An introduction to using area-based socioeconomic measures
WHY?
READ MORE
HOW TO
TRY IT OUT!
TOOLS
Executive Summary
Introduction
Publications
Geocoding
Generating ABSMs
Analytic Methods
Multi-level Modeling
Visual Display
Case Example
U.S. Census Tract Poverty Data

Glossary

STEP BY STEP COMPARISON
A step by step comparison of each task of the Case Example, the relevant section of Analytic Methods, and sample SAS code

(click here for a pdf version of all 8 steps)

Step by Step 1
Step by Step 2
Step by Step 3
Step by Step 4
Step by Step 5
Step by Step 6
Step by Step 7
Step by Step 8
Step 5:
For each category of CT poverty, calculate the age-standardized incidence rate,
using the year 2000 standard million.
 
CASE EXAMPLE
ANALYTIC METHODS
SAS PROGRAMMING
click here to download SAS program

In order to do this:

a. Aggregate the numerator and denominator within each age X CT poverty stratum, across all census tracts.

b. Exclude cases and denominator where CT poverty is missing.

c. Merge with the year 2000 standard million in five age categories.

d. Calculate the age-standardized incidence rate standardized to the year 2000 standard million, and the corresponding “gamma” confidence intervals for the direct standardized rates.

1. Age-standardized incidence rates
The standard practice of public health departments in reporting population rates of mortality and disease incidence is to calculate age-standardized rates, which facilitates comparisons between regions or subgroups of interest. The age-standardized rate is interpretable as the rate that would be observed in a population if that population had the same age distribution as a given reference population. Direct standardized rates are obtained by applying the age-specific incidence rates observed in the area or subgroup of interest to a standard age distribution, such as the year 2000 standard million.1

For our project, we used five broad age categories to age standardize, in order to obtain more stable rates in each age stratum, particularly for outcomes with sparse data.

If casesj represents the number of cases in age group j of the group or region of interest and popj represents the population associated with that age group, then the standardized rate IRst for the group or region is

 *************************************************************
CREATE DATASET WITH STANDARD MILLION FOR AGE STANDARDIZATION
(IN FIVE CATEGORIES)
0-14
15-24
25-44
45-64
65+
*************************************************************;
data stdrd ;
input agecat y1940 y1970 y1980 y1990 y2000 ;

cards ;
1 250416 284926 226401 215383 214700
2 181677 174405 187542 147860 138646
3 301303 236183 276838 324695 298186
4 198105 205746 196440 186446 222081
5 68499 98740 112779 125616 126387
;

RUN ;


PROC SORT DATA=Step4 ;
BY AGECAT CINDPOV ;
run ;

DATA Step5a ;
SET Step4 ;
WHERE CINDPOV^=. ;
BY AGECAT CINDPOV ;

retain NMR DNM ;

if first.CINDPOV then do ;

NMR=0 ;
DNM=0 ;
end ;

NMR=NMR+NUMER ;
DNM=DNM+DENOM ;

if last.CINDPOV then DO ;
output ;
END ;

DROP AREAKEY NUMER DENOM ;
RUN ;


DATA Step5b ;
MERGE Step5a (in=ina) stdrd (in=inb) ;
BY AGECAT ;
if ina and inb ;

w_i=y2000/1000000 ;

IR_i=NMR/DNM ;

varpy_i=NMR/DNM**2 ;

RUN ;


proc sort data=Step5b ;
BY CINDPOV ;
run ;

data Step5c ;
SET Step5b ;
by CINDPOV ;


************************************
IRW=weighted incidence rate
VARPY=part of person-time variance
VARPYW=weighted person-time variance
SUMWI=sum of weights
CRDEN=crude denominator
WMAX=maximum weight (for gamma CI)
************************************;

retain IRW VARPY VARPYW SUMWI WMAX CRNUM CRDEN ;

if first.CINDPOV then do ;

IRW=0 ;
VARPYW=0 ;
VARPY=0 ;
SUMWI=0 ;
CRNUM=0 ;
CRDEN=0;
WMAX=0 ;
end ;

IRW=IRW + (W_I*IR_I) ;
VARPY=VARPY + ((W_I**2)*VARPY_I) ;
SUMWI=SUMWI + W_I ;
CRNUM=CRNUM+NMR ;
CRDEN=CRDEN+DNM ;
WMAX=MAX(WMAX,W_I/DNM) ;

if last.CINDPOV then do ;

VARPYW=VARPY/(SUMWI**2) ;

********************************
LOWER 95% GAMMA INTERVAL

LGAM gives the 95% gamma interval using the formula given by Fay and Feuer.
LGAM2 gives the 95% gamma interval using the formula given Anderson and Rosenberg.

***NOTE: FOR LGAM2 AND UGAM2, HAVE NOW PROGRAMMED OPTIONS FOR IRW=0 VARPYW=0
I.E. USE INVERSE CHI SQUARE DISTRIBUTION CINV(0.975,2) AND DIVIDE BY DENOMINATOR
TO GET UPPER LIMIT ON RATE

References:
Fay MP, Feuer EJ. Confidence intervals for directly standardized rates:
a method based on the gamma distribution. Statistics in Medicine 1997,16:791-801.

Anderson RN, Rosenberg HM. Age standardization of death rates: implementation of the year 2000 standard.
National Vital Statistics Reports: Vol 37, No. 3. Hyattsville, MD:
National Center for Health Statistics, 1998.
********************************;

LGAM=(VARPYW/(2*IRW)) * CINV(0.025,((2*(IRW**2))/VARPYW)) ;

IF IRW=0 AND VARPYW=0 THEN DO ;
LGAM2=0 ;
END ;
ELSE LGAM2=(VARPYW/IRW) * GAMINV(0.025,((IRW**2)/VARPYW)) ;

********************************
UPPER 95% GAMMA

UGAM gives the 95% gamma interval using the formula given by Fay and Feuer.
UGAM2 gives the 95% gamma interval using the formula given Anderson and Rosenberg.

********************************;

UGAM=((VARPYW + (WMAX**2))/2*(IRW+WMAX)) * CINV(0.975,((2*((IRW + WMAX)**2))/(VARPYW + (WMAX**2)))) ;

IF IRW=0 AND VARPYW=0 THEN DO ;
UGAM2=(0.5 * CINV(0.975,2))/CRDEN ;
END ;
ELSE UGAM2=(VARPYW/IRW) * GAMINV(0.975,(((IRW**2)/VARPYW) + 1)) ;

********************************
REGULAR CONFIDENCE LIMITS
******************************** ;

LO95 = IRW - (1.96*SQRT(VARPYW)) ;
HI95 = IRW + (1.96*SQRT(VARPYW)) ;

OUTPUT ;
end ;


proc print ;
var CINDPOV IRW LGAM2 UGAM2 ;
run ;

where wj is the weight associated with category j in the reference (standardizing) population (e.g. the population size or the proportion of the total population). The estimated variance of the standardized rate is given by:
(When the wjs are proportions, then
and ).
2. Confidence intervals for directly standardized rates
Traditional confidence limits for the direct standardized rates are based on the normal distribution and require large cell counts. In our analyses, we found that they can also occasionally result in “impossible” lower limits that are less than zero. Because of this, we adopted an alternate method for calculating the confidence limits based on the inverse gamma function.2 This method assumes that the direct standardized rate is a linear combination of independent Poisson random variables. Assuming that this linear combination is also follows a Poisson distribution, the age-standardized rate E(X) = x follows a gamma distribution as follows:
where x is the age-standardized rate (IRst as estimated above) and v is its variance, as described above. Converting this to the gamma distribution in its standard form, i.e. where b=1, this yields
which greatly simplifies calculations. Then the lower confidence limit for
is given by
and the upper confidence limit for
is given by
where
is a continuity correction necessitated by using a continuous distribution to estimate confidence limits for a discrete random variable. Increasing the number of events by 1 in an age stratum i results in a
increase in the age-standardized rate. If kj is constant for all age intervals, then kj=k. However, since the values for wj and popj typically vary across age strata, it is unclear what value of k to use. A very conservative upper limit can be obtained by using the maximum value of kj = km.
However, following the recommendation of the NCHS, we used a close approximation that alleviates the need to calculate km:
To transform these intervals to obtain the desired confidence limits for X, we use
and .

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Home Page
previous step            next step
back to top
Who We Are
Acknowledgements
Contact Us
This work was funded by the National Institutes of Health (1RO1HD36865-01) via the National Institute of Child Health & Human Development (NICHD) and the Office of Behavioral & Social Science Research (OBSSR).
Copyright © 2004 by the President and Fellows of Harvard College - The Public Health Disparities Geocoding Project.