ANALYTIC
METHODS
(click here
for a pdf version of this page)
|
| |
|
|
|
|
|
|
Aggregating
Numerator Data |
Data
from public health databases are typically formatted such that
each record represents one person (or case report). Once these
data have been geocoded, they need to be aggregated before linking
to denominator and ABSM data. Before aggregating, however, one
should exclude all records that are not geocoded, do not meet
the case definition, or are missing data on the important covariates
(e.g. age, in the case of simple age-standardized analyses; age,
sex, and race/ethnicity in the case of more complex stratified
analyses).
One
can think of the basic unit of aggregation as a cell, defined
by age and other covariates, within an area/geocode. Once aggregated,
this cell within an area can be linked to a relevant population
denominator. The cell contains a count of all cases within that
area that meet the specified age and other covariate criteria.
Since our goal is eventually to create rates, we call this count
of cases the “numerator.”
Example:
We intend to age-standardize in 5 broad age categories, 0-14,
15-24, 25-44, 45-64, 65+. Therefore, we need to aggregate the
records in each census tract into cells defined by the corresponding
ages. As an example, consider the following 23 records from
census tracts 25009250500 and 25009250800.
|
| |
Before
aggregating: |
back
to top |
| Record
# |
Geocode |
Age
at death |
| 1 |
25009250500 |
<1 |
| 2 |
25009250500 |
<1 |
| 3 |
25009250500 |
<1 |
| 4 |
25009250500 |
17 |
| 5 |
25009250500 |
19 |
| 6 |
25009250500 |
27 |
| 7 |
25009250500 |
38 |
| 8 |
25009250500 |
40 |
| 9 |
25009250500 |
40 |
| 10 |
25009250500 |
44 |
| 11 |
25009250800 |
<1 |
| 12 |
25009250800 |
<1 |
| 13 |
25009250800 |
5 |
| 14 |
25009250800 |
22 |
| 15 |
25009250800 |
24 |
| 16 |
25009250800 |
26 |
| 17 |
25009250800 |
31 |
| 18 |
25009250800 |
36 |
| 19 |
25009250800 |
36 |
| 20 |
25009250800 |
40 |
| 21 |
25009250800 |
43 |
| 22 |
25009250800 |
43 |
| 23 |
25009250800 |
43 |
| After
aggregating: |
| Geocode
|
Age
category |
Number
of deaths (numerator) |
| 25009250500 |
0-14 |
3 |
| 25009250500 |
15-24 |
2 |
| 25009250500 |
25-44 |
5 |
| 25009250800 |
0-14
|
3 |
| 25009250800 |
15-24 |
2 |
| 25009250800 |
25-44 |
8 |
|
View
Case Example & SAS Programming for Step 1
|
Aggregating
Denominator Data |
Denominator
data at the census tract level typically come from the decennial
census. In 1990, the US Census reported population counts by age
in 31 categories (<1, 1-2, 3-4, 5, 6, 7-9, 10-11, 12-13, 14,
15, 16, 17, 18, 19, 20, 21, 22-24, 25-29, 30-34, 35-39, 40-44,
45-49, 50-54, 55-59, 60-61, 62-64, 65-69, 70-74, 75-79, 80-84,
85+). In
the 1990 US Census STF3, age specific population counts were reported
in table P013. Variable P0130001 gave the count of residents <1
year old, P0130002 gave the count of residents 1-2 years old,
etc.
For
the purposes of age standardization, these age categories need
to be re-aggregated to match the age categories used for categorizing
case data (numerators, above) and the age categories from the
standard million reference population. Additionally, when using
case data from multiple years, in order to calculate an average
annual incidence rate, one needs to use a person-time denominator
(population count multiplied by number of years of case data).
For example, in the case of the Massachusetts all-cause mortality
data, we have three years worth of cases (1989-1991). Therefore,
we multiply the population count in each age category by 3.
Example:
For census tract 25009250800 in 1990, we wish to age standardize
using the same five broad age categories as in the numerator
example above (0-14, 15-24, 25-44, 45-64, 65+):
|
| |
Before: |
|
| Census
variable |
Ages
(years) |
Population
count |
| P0130001 |
<1 |
115 |
| P0130002 |
1-2 |
243 |
| P0130003 |
3-4 |
197 |
| P0130004 |
5 |
92 |
| P0130005 |
6 |
59 |
| P0130006 |
7-9 |
237 |
| P0130007 |
10-11 |
160 |
| P0130008 |
12-13 |
141 |
| P0130009 |
14 |
77 |
| P0130010 |
15 |
62 |
| P0130011 |
16 |
54 |
| P0130012 |
17 |
94 |
| P0130013 |
18 |
65 |
| P0130014 |
19 |
89 |
| P0130015 |
20 |
101 |
| P0130016 |
21 |
128 |
| P0130017 |
22-24 |
387 |
| P0130018 |
25-29 |
571 |
| P0130019 |
30-34 |
746 |
| P0130020 |
35-39 |
422 |
| P0130021 |
40-44 |
354 |
| P0130022 |
45-49 |
317 |
| P0130023 |
50-54 |
176 |
| P0130024 |
55-59 |
174 |
| P0130025 |
60-61 |
65 |
| P0130026 |
62-64 |
214 |
| P0130027 |
65-69 |
158 |
| P0130028 |
70-74 |
316 |
| P0130029
|
75-79 |
178 |
| P0130030 |
80-84 |
112 |
| P0130031 |
85+
|
69 |
In
order to collapse these variables into the five broad age categories,
we have to sum up census variables as follows:
|
| |
After: |
back
to top |
| Age
category |
Population
count |
Person-time
denominator
(x 3 years of case data) |
| 0-14
|
SUM
OF (P0130001 -- P0130009) |
1321
|
3963 |
| 15-24
|
SUM
OF (P0130010 -- P0130017) |
980 |
2940 |
| 25-44
|
SUM
OF (P0130018 -- P0130021) |
2093 |
6279 |
| 45-64 |
SUM
OF (P0130022 -- P0130026) |
946 |
2838 |
| 65+ |
SUM
OF (P0130027 -- P0130031) |
833 |
2499 |
| |
| Merging
numerators with denominators and ABSM. |
Once
the numerators and denominators have the same structure (AREAKEY
x AGECAT), they can be merged together, along with the ABSM data
(by AREAKEY). For age cells within areas where no cases were reported,
we set the numerator to zero.
Example:
|
| |
Before
merging with ABSM: |
|
| Numerator
dataset: |
| Geocode/Areakey |
Age
category |
Number
of deaths (numerator) |
| 25009250500 |
0-14 |
3 |
| 25009250500 |
15-24 |
2 |
| 25009250500 |
25-44 |
5 |
| 25009250500 |
45-64
|
7 |
| 25009250500 |
65+
|
26 |
| 25009250800 |
0-14 |
4 |
| 25009250800 |
15-24 |
3 |
| 25009250800 |
25-44 |
8 |
| 25009250800 |
45-64 |
13 |
| 25009250800 |
65+ |
132 |
| Denominator
dataset: |
| Geocode/Areakey |
Age
category |
Person-time
denominator (x 3 years of case data) |
| 25009250500 |
0-14 |
4152 |
|
25009250500 |
15-24 |
1953 |
| 25009250500 |
25-44 |
3489 |
| 25009250500 |
45-64
|
1233 |
| 25009250500 |
65+
|
1212 |
| 25009250800 |
0-14 |
3963 |
| 25009250800 |
15-24 |
2940 |
|
25009250800 |
25-44 |
6279 |
| 25009250800 |
45-64 |
2838 |
| 25009250800 |
65+ |
2499 |
| |
After
merging with ABSM: |
|
| Geocode
|
Age
category |
Poverty
|
Numerator |
Denominator |
| 25009250500 |
1 |
4 |
3 |
4152 |
| 25009250500 |
2 |
4 |
2 |
1953 |
| 25009250500 |
3 |
4 |
5 |
3489 |
| 25009250500 |
4 |
4 |
7 |
1233 |
|
25009250500 |
5 |
4 |
26 |
1212 |
| 25009250800 |
1 |
3 |
4 |
3963 |
| 25009250800 |
2 |
3 |
3 |
2940 |
| 25009250800 |
3 |
3 |
8 |
6279 |
| 25009250800 |
4 |
3 |
13 |
2838 |
| 25009250800 |
5 |
3 |
132 |
2499 |
| |
Aggregating
OVER areas into ABSM strata |
Next,
in order to generate rates for categories of a specific ABSM,
it is necessary to aggregate OVER areas into strata defined by
AGECAT and ABSM. Numerators and denominators from census tracts
with missing ABSM data for a particular ABSM are typically excluded
from that analysis.
Example:
In Suffolk County, Massachusetts, there are a total of 189 census
tracts. We wish to examine all cause mortality rates by poverty,
with poverty categorized into 4 strata (0-4.9%, 5-9.9%, 10-19.9%,
and 20-100%).
|
| |
ABSM:
CT Poverty |
Number
of census tracts |
|
| 0.0-4.9% |
10 |
| 5.0-9.9% |
37 |
| 10.0-19.9% |
56 |
| 20.0-100.0% |
83 |
| Missing
poverty data |
3 |
| |
Thus,
to obtain the mortality rates in the least impoverished stratum
(0.0-4.9% below poverty), we need to aggregate the cases and
the population at risk OVER the ten census tracts in that stratum
(preserving the age structure WITHIN each poverty stratum so
that we can age standardize in the following step, below). For
the next poverty stratum (5.0-9.9%) we need to aggregate the
cases and the population denominator over 37 census tracts,
and so on. Cases and population at risk in the three census
tracts with missing poverty data are excluded from the analysis.
This
yields the following table:
|
| . |
ABSM:
CT poverty |
Age
category |
Numerator
|
Denominator
|
|
0.0-4.9%
|
0-14 |
1
|
10,608
|
| 0.0-4.9% |
15-24 |
5
|
9,984
|
| 0.0-4.9% |
25-44 |
54
|
29,190
|
| 0.0-4.9% |
45-64 |
106 |
16,710
|
| 0.0-4.9% |
65+ |
657 |
15,825
|
5.0-9.9%
|
0-14 |
40
|
69,939 |
| 5.0-9.9% |
15-24 |
39
|
64,065
|
| 5.0-9.9% |
25-44 |
252
|
179,595
|
| 5.0-9.9% |
45-64 |
792 |
90,042
|
| 5.0-9.9% |
65+ |
4,535
|
80,916
|
10.0-19.9%
|
0-14 |
101 |
88,989
|
| 10.0-19.9% |
15-24 |
93
|
93,147
|
| 10.0-19.9% |
25-44 |
531
|
224,793
|
| 10.0-19.9% |
45-64 |
962
|
100,479
|
| 10.0-19.9% |
65+ |
3,944
|
71,955
|
| 20.0-100.0% |
0-14 |
182
|
155,193 |
| 20.0-100.0% |
15-24 |
170
|
217,593
|
| 20.0-100.0% |
25-44 |
831 |
288,882
|
| 20.0-100.0% |
45-64 |
1,291
|
108,588
|
| 20.0-100.0% |
65+ |
3,645
|
72,720
|
Generating
Rates and Other Summary Measures/Measures of Effect |
1. Age-standardized incidence rates
The standard practice of public health departments in reporting
population rates of mortality and disease incidence is to calculate
age-standardized rates, which facilitates comparisons between
regions or subgroups of interest. The age-standardized rate is
interpretable as the rate that would be observed in a population
if that population had the same age distribution as a given reference
population. Standardization by the direct method involves taking
a weighted average of the age specific incidence rates observed
in the area or subgroup of interest, where the weights come from
a standard age distribution, such as the year 2000 standard million.1
"Standard million" reference populations are available
based on the US population age distribution for 1940, 1970, 1980,
1990, and 2000. Here we present the standard million in 11 age
categories.
|
| |
Age
(years) |
Standard
million reference population |
|
| Year
1940 |
Year
1970 |
Year
1980 |
Year
1990 |
Year
2000 |
| <1 |
15,343 |
17,150 |
15,598 |
12,936 |
13,818 |
| 1-4 |
64,718 |
67,265 |
56,565 |
60,863 |
55,317 |
| 5-14 |
170,355 |
200,511 |
154,238 |
141,584 |
145,565 |
| 15-24 |
181,677 |
174,405 |
187,542 |
147,860 |
138,646 |
| 25-34 |
162,066 |
122,567 |
163,683 |
173,600 |
135,573 |
| 35-44 |
139,237 |
113,616 |
113,155 |
151,095 |
162,613 |
| 45-54 |
117,811 |
114,265 |
100,641 |
101,416 |
134,834 |
| 55-64 |
80,294 |
91,481 |
95,799 |
85,030 |
87,247 |
| 65-74 |
48,426 |
61,192 |
68,775 |
72,802 |
66,037 |
| 75-84 |
17,303 |
30,112 |
34,116 |
40,429 |
44,842 |
| 85+ |
2,770 |
7,436 |
9,888 |
12,385 |
15,508 |
For
our project, we used five broad age categories to age standardize,
in order to obtain more stable rates in each age stratum, particularly
for outcomes with sparse data. The relationship between our five
categories and the standard eleven categories is illustrated in
the table below.
|
| |
Age
in 11 categories |
Year
2000 standard million |
Age
in 5 categories
|
Year
2000 standard million
|
|
| <1 |
13,818
|
<15 |
214,700
|
| 1-4 |
55,317
|
| 5-14 |
145,565
|
| 15-24 |
138,646
|
15-24 |
138,646
|
| 25-34 |
135,573
|
25-44 |
298,186
|
| 35-44 |
162,613
|
| 45-54 |
134,834
|
45-64 |
222,081
|
| 55-64 |
87,247
|
| 65-74 |
66,037
|
65+ |
126,387
|
| 75-84 |
44,842
|
| 85+ |
15,508
|
Example:
To calculate the age-standardized all cause mortality rates
in each of the four poverty strata in Suffolk County, we start
with the age-specific mortality data. In each poverty stratum,
the age standardized mortality rate is calculated as a weighted
sum of the age-specific mortality rates, with the weights for
each age stratum defined by the Year 2000 standard million.
|
| |
ABSM:
CT poverty |
Age
category |
Numerator
|
Denominator
|
Year
2000 standard million
|
wj
(weight)
|
IRj
(incidence rate per 100,000)
|
IRst
(age standardized rate per 100,000) |
|
0.0-4.9%
|
0-14 |
1
|
10,608
|
214,700 |
0.215 |
9.4
|
729.7
|
| 0.0-4.9% |
15-24 |
5
|
9,984
|
138,646
|
0.139 |
50.1
|
| 0.0-4.9% |
25-44 |
54
|
29,190
|
298,186
|
0.298 |
185.0
|
| 0.0-4.9% |
45-64 |
106 |
16,710
|
222,081 |
0.222 |
634.4
|
| 0.0-4.9% |
65+ |
657 |
15,825
|
126,387 |
0.126 |
4,151.7 |
5.0-9.9%
|
0-14 |
40
|
69,939 |
214,700 |
0.215 |
57.2
|
966.2
|
| 5.0-9.9% |
15-24 |
39
|
64,065
|
138,646
|
0.139 |
60.9 |
| 5.0-9.9% |
25-44 |
252
|
179,595
|
298,186
|
0.298 |
140.3
|
| 5.0-9.9% |
45-64 |
792 |
90,042
|
222,081
|
0.222 |
879.6
|
| 5.0-9.9% |
65+ |
4,535
|
80,916
|
126,387 |
0.126 |
5,604.6
|
10.0-19.9%
|
0-14 |
101 |
88,989
|
214,700
|
0.215 |
113.5
|
1,014.0
|
| 10.0-19.9% |
15-24 |
93
|
93,147
|
138,646
|
0.139 |
99.8
|
| 10.0-19.9% |
25-44 |
531
|
224,793
|
298,186
|
0.298 |
236.2
|
| 10.0-19.9% |
45-64 |
962
|
100,479
|
222,081
|
0.222 |
957.4
|
| 10.0-19.9% |
65+ |
3,944
|
71,955
|
126,387
|
0.126 |
5,481.2
|
| 20.0-100.0% |
0-14 |
182
|
155,193 |
214,700
|
0.215 |
117.3
|
1,019.3 |
| 20.0-100.0% |
15-24 |
170
|
217,593
|
138,646
|
0.139 |
78.1 |
| 20.0-100.0% |
25-44 |
831 |
288,882
|
298,186
|
0.298 |
287.7 |
| 20.0-100.0% |
45-64 |
1,291
|
108,588
|
222,081 |
|