|
|
|
| 01.1>
|
The Global Burden of Disease 2000 project: aims, methods and data sources |
|
|
| |
The GBD 2000 aims to
produce the best possible evidence-based description of health, the causes of
lost health, and likely future trends in health. To the extent possible, the GBD 2000 aims to utilize and
synthesize within a consistent and comprehensive framework, all
relevant epidemiological evidence on population demography and health for the
various regions of the world. Where
evidence is uncertain or incomplete, the GBD 2000 attempts to make the best
possible inferences based on the knowledge base that is available, and to
assess the uncertainty in the resulting estimates. This paper summarizes the analysis categories, methods and data
sources for the GBD 2000.
|
|
|
| 01.2 |
Logistic Regression in Rare Events Data |
|
|
| |
This paper studies
rare events data, binary dependent variables with dozens to thousands of times
fewer ones (events, such as wars, vetoes, cases of political activism, or
epidemiological infections) than zeros (“nonevents”). First, popular statistical procedures,
such as logistical
regression, can sharply underestimate the probability of rare events. This paper recommends corrections that
outperform existing methods and change the estimates of absolute and relative
risks by as much as some estimated effects reported in the literature. Second, commonly used data collection
strategies are grossly inefficient for rare events data. More efficient sampling designs exist for
making valid inferences. The paper also
provides methods that link these two results, enabling both types of
corrections to work simultaneously, and software that implements the methods
developed.
|
|
|
| 01.3 |
Analyzing Incomplete Political Science Data: An Alternative Algorithm for |
|
Multiple Imputation |
|
|
| |
Some researchers
avoid the problems missing data can cause by using sophisticated statistical
models optimized for their particular applications. Unfortunately, doing so may put heavy burdens on the
investigator, since optimal models for missing data differ with each
application, are not programmed in currently available standard statistical
software, and do not exist for many applications. Our complementary approach is to find a better choice in the
class of widely applicable and easy-to-use methods for missing data. The paper begins with a review of three
types of assumptions, and then we demonstrate analytically the disadvantages of
list-wise deletions. Next, the paper introduces multiple imputation and our alternative algorithm,
and then presents two examples of applied research to illustrate how assumptions about and
methods for missing data can affect our conclusions about government and
politics.
|
|
|
| 01.4 |
Estimating Risk and Rate Levels, Ratios, and Differences in Case-Control |
|
Studies |
|
|
| |
Classic (or
“cumulative”) case-control sampling designs do not admit inferences about
quantities of interest other than risk ratios, and then only by making the rare
events assumption. Probabilities, risk
differences, and other quantities cannot be computed without knowledge of the
population incidence fraction. Similarly,
density (or “risk set”) case-control sampling designs do not allow inferences
about quantities other than the rate ratio. Rates, rate differences, cumulative rates, risks, and other quantities
cannot be estimated unless auxiliary information about the underlying cohort
such as the number of controls in each full risk is available. The paper addresses this problem by
developing methods that allow valid inferences about all relevant quantities of
interest from either type of case-control study when completely ignorant of or
only partially knowledgeable about auxiliary population information.
|
|
|
| 01.5 |
Improving Forecasts of State Failure |
|
|
| |
This paper offers the
first independent scholarly evaluation of the claims, forecasts, and causal
inferences of the State Failure Task Force and their efforts to forecast when
states fail. State failure refers to
the collapse of the authority of the central government to impose order, as in
civil wars, revolutionary wars, genocides, politicides, and adverse or disruptive
regime transitions. The paper identifies
several methodological errors in the task force work that cause their reported
forecast probabilities of conflict to be too large, their causal inferences to
be biased in unpredictable directions, and their claims of forecasting
performance to be exaggerated. By
reanalyzing their data with better statistical procedures, the paper also is
able to offer the first accurate forecasts of state failure, along with
procedures and results that may be of practical use in informing foreign policy
decision-making.
|
|
|
| 01.6 |
A Fast, Easy, and Efficient Estimator for Multiparty Electoral Data |
|
|
| |
Katz and King (1999)
develop a model, analogous to what least squares regression provides American
politics researchers in that two-party system, for predicting or explaining
aggregate electoral results in multiparty democracies. Katz and King, in applying their model to
three-party elections in England, reveal a variety of new features of
incumbency advantage and sources of party support. Although the mathematics of their statistical model covers any number
of political parties, it is computationally demanding, and hence slow and
numerically imprecise, with more than three parties. This paper produces an approximate method that works in practice
with many parties without making too many theoretical compromises. By treating the problem as one of missing
data, the authors are able to use a modification of the fast EMis algorithm of
King, Honaker, Joseph, and Scheve (2000) and to provide easy-to-use software,
while retaining the attractive features of the Katz and King model, such as the
t distribution and explicit models
for uncontested seats.
|
|
|
| 01.9 |
How Factual is Your Counterfactual? |
|
|
| |
Inferences about
counterfactuals are essential for prediction, answering “what if” questions,
and estimating causal effects. However,
when the counterfactuals posed are too far from the data at hand, conclusions
drawn from well-specified statistical analyses become based on speculation and
convenient but indefensible model assumptions rather than empirical evidence.
Yet, standard model outputs do not reveal
the degree of model-dependence, and so this problem can be hard to detect.
This paper develops easy-to-apply methods to
evaluate counterfactuals that do not require sensitivity testing over specified
classes of models. The authors use
these methods to evaluate the extensive scholarly literatures on the effects of
changes in the degree of democracy in a country (on any dependent variable),
and find evidence that many scholars are inadvertently drawing conclusions
about democracy based more on their hypotheses than on their empirical
evidence.
|
|
|
| 01.12 |
Did Illegally Counted Overseas Absentee Ballots Decide the 2000 U.S. |
|
Presidential Election? |
|
|
| |
Although not widely
known until much later, Al Gore received 202 more votes than George W. Bush on
election day in Florida. George W. Bush
is president because he overcame his election day deficit with overseas
absentee ballots that arrived and were counted after election day. After the election, the New York Times
conducted a six month long investigation and found that 680 of the overseas
absentee ballots were illegally counted, and no partisan, pundit, or academic
has publicly disagreed with their assessment. This paper describes the statistical procedures developed and implemented
for the Times to ascertain whether disqualifying these 680 votes would
have changed the outcome of the election. The methods involve adding formal Bayesian model averaging procedures to
King’s (1997) ecological inference model. The authors show how they derived the results for the Times, and
also present a variety of new empirical results that delineate the precise
conditions under which Al Gore would have been elected president.
|
|
|
| 01.14 |
Statistical Models for Enhancing Cross-Population Comparability |
|
|
| |
Measuring the health
state of individuals is important for the evaluation of health interventions,
monitoring individual health progress, and as a critical step in measuring the
health of populations. Self-report
responses in household survey data are widely used for assessing the non-fatal
health status of populations. The
object of this document is to elaborate on several statistical models used in
the analysis of survey data. First, the
paper focuses on off-the-shelf models that are widely available as part of any
standard statistical software. In
particular, the authors demonstrate the problems of inference that arise from
these standard methods when the underlying data are not cross-population
comparable. In later sections, the
authors introduce methods that modify these standard routines to enhance the
cross-population comparability of survey analyses.
|
|
|
| 01.18 |
An Individual-Level Approach to Health Inequality: Child Survival in 50 |
|
Countries |
|
|
| |
Reducing health
inequalities is an important part of the agenda of health policymakers
globally. Studies of health inequalities have revealed large variations in average health status across
social, economic, and other groups. The authors use an extended beta-binomial model to estimate the
distribution of the risk of death in children under the age of two in the 50
developing countries where data from Demographic and Health Survey are
available. The authors then find that
inequality estimates should be routinely reported alongside average levels of
health, as they reveal important information about the distribution of health
in populations. Measuring inequality
with individual level data, rather than quantifying differences in average
levels of health across social groups, enables meaningful comparisons of
inequality across countries and analyses of the determinants of inequality, and
this approach should be extended to the measurement of inequalities in healthy
life expectancy.
|
|
|
| 01.19 |
Measurement Methods for Inequalities in the Risk of Adult Mortality
|
|
|
| |
Unlike with children
many fewer datasets are readily available for the estimation of inequality in
adult risk of dying. Particularly in
low and middle-income countries, information on average levels of mortality is
often unreliable, let alone data on the distribution of mortality risk within
countries. Intensive efforts have been
made to identify datasets that are well suited for the analysis of inequality
in mortality risk in adults and models have been modified to fit this
analysis. This paper presents an
overview of data available, proposes a model to estimate the distribution of
mortality risk in adults and an alternative process of approximating that
distribution in countries where data are not readily available.
|
|
|
|
|
|