The Public Health Disparities Geocoding Project Monograph Geocoding and Monitoring US Socioeconomic Inequalities in Health: An introduction to using area-based socioeconomic measures
 WHY? READ MORE HOW TO TRY IT OUT! TOOLS Executive Summary Introduction Publications Geocoding Generating ABSMs Analytic Methods Multi-level Modeling Visual Display Case Example U.S. Census Tract Poverty Data
 STEP BY STEP COMPARISON A step by step comparison of each task of the Case Example, the relevant section of Analytic Methods, and sample SAS code (click here for a pdf version of all 8 steps) Step by Step 1 Step by Step 2 Step by Step 3 Step by Step 4 Step by Step 5 Step by Step 6 Step by Step 7 Step by Step 8
 Step 5: For each category of CT poverty, calculate the age-standardized incidence rate, using the year 2000 standard million. CASE EXAMPLE ANALYTIC METHODS SAS PROGRAMMING click here to download SAS program In order to do this: a. Aggregate the numerator and denominator within each age X CT poverty stratum, across all census tracts. b. Exclude cases and denominator where CT poverty is missing. c. Merge with the year 2000 standard million in five age categories. d. Calculate the age-standardized incidence rate standardized to the year 2000 standard million, and the corresponding “gamma” confidence intervals for the direct standardized rates. 1. Age-standardized incidence rates The standard practice of public health departments in reporting population rates of mortality and disease incidence is to calculate age-standardized rates, which facilitates comparisons between regions or subgroups of interest. The age-standardized rate is interpretable as the rate that would be observed in a population if that population had the same age distribution as a given reference population. Direct standardized rates are obtained by applying the age-specific incidence rates observed in the area or subgroup of interest to a standard age distribution, such as the year 2000 standard million.1 For our project, we used five broad age categories to age standardize, in order to obtain more stable rates in each age stratum, particularly for outcomes with sparse data. If cases`j` represents the number of cases in age group j of the group or region of interest and pop`j` represents the population associated with that age group, then the standardized rate IR`st` for the group or region is ************************************************************* CREATE DATASET WITH STANDARD MILLION FOR AGE STANDARDIZATION (IN FIVE CATEGORIES) 0-14 15-24 25-44 45-64 65+ *************************************************************; data stdrd ; input agecat y1940 y1970 y1980 y1990 y2000 ; cards ; 1 250416 284926 226401 215383 214700 2 181677 174405 187542 147860 138646 3 301303 236183 276838 324695 298186 4 198105 205746 196440 186446 222081 5 68499 98740 112779 125616 126387 ; RUN ; PROC SORT DATA=Step4 ; BY AGECAT CINDPOV ; run ; DATA Step5a ; SET Step4 ; WHERE CINDPOV^=. ; BY AGECAT CINDPOV ; retain NMR DNM ; if first.CINDPOV then do ; NMR=0 ; DNM=0 ; end ; NMR=NMR+NUMER ; DNM=DNM+DENOM ; if last.CINDPOV then DO ; output ; END ; DROP AREAKEY NUMER DENOM ; RUN ; DATA Step5b ; MERGE Step5a (in=ina) stdrd (in=inb) ; BY AGECAT ; if ina and inb ; w_i=y2000/1000000 ; IR_i=NMR/DNM ; varpy_i=NMR/DNM**2 ; RUN ; proc sort data=Step5b ; BY CINDPOV ; run ; data Step5c ; SET Step5b ; by CINDPOV ; ************************************ IRW=weighted incidence rate VARPY=part of person-time variance VARPYW=weighted person-time variance SUMWI=sum of weights CRDEN=crude denominator WMAX=maximum weight (for gamma CI) ************************************; retain IRW VARPY VARPYW SUMWI WMAX CRNUM CRDEN ; if first.CINDPOV then do ; IRW=0 ; VARPYW=0 ; VARPY=0 ; SUMWI=0 ; CRNUM=0 ; CRDEN=0; WMAX=0 ; end ; IRW=IRW + (W_I*IR_I) ; VARPY=VARPY + ((W_I**2)*VARPY_I) ; SUMWI=SUMWI + W_I ; CRNUM=CRNUM+NMR ; CRDEN=CRDEN+DNM ; WMAX=MAX(WMAX,W_I/DNM) ; if last.CINDPOV then do ; VARPYW=VARPY/(SUMWI**2) ; ******************************** LOWER 95% GAMMA INTERVAL LGAM gives the 95% gamma interval using the formula given by Fay and Feuer. LGAM2 gives the 95% gamma interval using the formula given Anderson and Rosenberg. ***NOTE: FOR LGAM2 AND UGAM2, HAVE NOW PROGRAMMED OPTIONS FOR IRW=0 VARPYW=0 I.E. USE INVERSE CHI SQUARE DISTRIBUTION CINV(0.975,2) AND DIVIDE BY DENOMINATOR TO GET UPPER LIMIT ON RATE References: Fay MP, Feuer EJ. Confidence intervals for directly standardized rates: a method based on the gamma distribution. Statistics in Medicine 1997,16:791-801. Anderson RN, Rosenberg HM. Age standardization of death rates: implementation of the year 2000 standard. National Vital Statistics Reports: Vol 37, No. 3. Hyattsville, MD: National Center for Health Statistics, 1998. ********************************; LGAM=(VARPYW/(2*IRW)) * CINV(0.025,((2*(IRW**2))/VARPYW)) ; IF IRW=0 AND VARPYW=0 THEN DO ; LGAM2=0 ; END ; ELSE LGAM2=(VARPYW/IRW) * GAMINV(0.025,((IRW**2)/VARPYW)) ; ******************************** UPPER 95% GAMMA UGAM gives the 95% gamma interval using the formula given by Fay and Feuer. UGAM2 gives the 95% gamma interval using the formula given Anderson and Rosenberg. ********************************; UGAM=((VARPYW + (WMAX**2))/2*(IRW+WMAX)) * CINV(0.975,((2*((IRW + WMAX)**2))/(VARPYW + (WMAX**2)))) ; IF IRW=0 AND VARPYW=0 THEN DO ; UGAM2=(0.5 * CINV(0.975,2))/CRDEN ; END ; ELSE UGAM2=(VARPYW/IRW) * GAMINV(0.975,(((IRW**2)/VARPYW) + 1)) ; ******************************** REGULAR CONFIDENCE LIMITS ******************************** ; LO95 = IRW - (1.96*SQRT(VARPYW)) ; HI95 = IRW + (1.96*SQRT(VARPYW)) ; OUTPUT ; end ; proc print ; var CINDPOV IRW LGAM2 UGAM2 ; run ; where w`j` is the weight associated with category j in the reference (standardizing) population (e.g. the population size or the proportion of the total population). The estimated variance of the standardized rate is given by: (When the w`j`s are proportions, then and ). 2. Confidence intervals for directly standardized rates Traditional confidence limits for the direct standardized rates are based on the normal distribution and require large cell counts. In our analyses, we found that they can also occasionally result in “impossible” lower limits that are less than zero. Because of this, we adopted an alternate method for calculating the confidence limits based on the inverse gamma function.2 This method assumes that the direct standardized rate is a linear combination of independent Poisson random variables. Assuming that this linear combination is also follows a Poisson distribution, the age-standardized rate E(X) = x follows a gamma distribution as follows: where x is the age-standardized rate (IR`st` as estimated above) and v is its variance, as described above. Converting this to the gamma distribution in its standard form, i.e. where b=1, this yields which greatly simplifies calculations. Then the lower confidence limit for is given by and the upper confidence limit for is given by where is a continuity correction necessitated by using a continuous distribution to estimate confidence limits for a discrete random variable. Increasing the number of events by 1 in an age stratum i results in a increase in the age-standardized rate. If k`j` is constant for all age intervals, then k`j`=k. However, since the values for w`j` and pop`j` typically vary across age strata, it is unclear what value of k to use. A very conservative upper limit can be obtained by using the maximum value of k`j` = k`m`. However, following the recommendation of the NCHS, we used a close approximation that alleviates the need to calculate k`m`: To transform these intervals to obtain the desired confidence limits for X, we use and .