Project Overview
Project Publications
Key Personnel
·  Gary King, PhD.
Progress / News
·  Newsletter
Core
Projects
·  Adult mortality
·  Non-communicable disease
·  Statistical methods
·  Avoidable chronic disease
·  Self-reported health measures
·  Summary measures
·  Costs of aging

  PROJECT PUBLICATIONS

 01.1 The Global Burden of Disease 2000 project: aims, methods and data sources
 

   The GBD 2000 aims to produce the best possible evidence-based description of health, the causes of lost health, and likely future trends in health. To the extent possible, the GBD 2000 aims to utilize and synthesize within a consistent and comprehensive framework, all relevant epidemiological evidence on population demography and health for the various regions of the world. Where evidence is uncertain or incomplete, the GBD 2000 attempts to make the best possible inferences based on the knowledge base that is available, and to assess the uncertainty in the resulting estimates. This paper summarizes the analysis categories, methods and data sources for the GBD 2000.

 01.2 Logistic Regression in Rare Events Data
 

   This paper studies rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). First, popular statistical procedures, such as logistical regression, can sharply underestimate the probability of rare events. This paper recommends corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. More efficient sampling designs exist for making valid inferences. The paper also provides methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

 01.3 Analyzing Incomplete Political Science Data: An Alternative Algorithm for
Multiple Imputation
 

   Some researchers avoid the problems missing data can cause by using sophisticated statistical models optimized for their particular applications. Unfortunately, doing so may put heavy burdens on the investigator, since optimal models for missing data differ with each application, are not programmed in currently available standard statistical software, and do not exist for many applications. Our complementary approach is to find a better choice in the class of widely applicable and easy-to-use methods for missing data. The paper begins with a review of three types of assumptions, and then we demonstrate analytically the disadvantages of list-wise deletions. Next, the paper introduces multiple imputation and our alternative algorithm, and then presents two examples of applied research to illustrate how assumptions about and methods for missing data can affect our conclusions about government and politics.

 01.4 Estimating Risk and Rate Levels, Ratios, and Differences in Case-Control
Studies
 

   Classic (or “cumulative”) case-control sampling designs do not admit inferences about quantities of interest other than risk ratios, and then only by making the rare events assumption. Probabilities, risk differences, and other quantities cannot be computed without knowledge of the population incidence fraction. Similarly, density (or “risk set”) case-control sampling designs do not allow inferences about quantities other than the rate ratio. Rates, rate differences, cumulative rates, risks, and other quantities cannot be estimated unless auxiliary information about the underlying cohort such as the number of controls in each full risk is available. The paper addresses this problem by developing methods that allow valid inferences about all relevant quantities of interest from either type of case-control study when completely ignorant of or only partially knowledgeable about auxiliary population information.

 01.5 Improving Forecasts of State Failure
 

   This paper offers the first independent scholarly evaluation of the claims, forecasts, and causal inferences of the State Failure Task Force and their efforts to forecast when states fail. State failure refers to the collapse of the authority of the central government to impose order, as in civil wars, revolutionary wars, genocides, politicides, and adverse or disruptive regime transitions. The paper identifies several methodological errors in the task force work that cause their reported forecast probabilities of conflict to be too large, their causal inferences to be biased in unpredictable directions, and their claims of forecasting performance to be exaggerated. By reanalyzing their data with better statistical procedures, the paper also is able to offer the first accurate forecasts of state failure, along with procedures and results that may be of practical use in informing foreign policy decision-making.

 01.6 A Fast, Easy, and Efficient Estimator for Multiparty Electoral Data
 

   Katz and King (1999) develop a model, analogous to what least squares regression provides American politics researchers in that two-party system, for predicting or explaining aggregate electoral results in multiparty democracies. Katz and King, in applying their model to three-party elections in England, reveal a variety of new features of incumbency advantage and sources of party support. Although the mathematics of their statistical model covers any number of political parties, it is computationally demanding, and hence slow and numerically imprecise, with more than three parties. This paper produces an approximate method that works in practice with many parties without making too many theoretical compromises. By treating the problem as one of missing data, the authors are able to use a modification of the fast EMis algorithm of King, Honaker, Joseph, and Scheve (2000) and to provide easy-to-use software, while retaining the attractive features of the Katz and King model, such as the t distribution and explicit models for uncontested seats.

 01.9 How Factual is Your Counterfactual?
 

   Inferences about counterfactuals are essential for prediction, answering “what if” questions, and estimating causal effects. However, when the counterfactuals posed are too far from the data at hand, conclusions drawn from well-specified statistical analyses become based on speculation and convenient but indefensible model assumptions rather than empirical evidence. Yet, standard model outputs do not reveal the degree of model-dependence, and so this problem can be hard to detect. This paper develops easy-to-apply methods to evaluate counterfactuals that do not require sensitivity testing over specified classes of models. The authors use these methods to evaluate the extensive scholarly literatures on the effects of changes in the degree of democracy in a country (on any dependent variable), and find evidence that many scholars are inadvertently drawing conclusions about democracy based more on their hypotheses than on their empirical evidence.

 01.12 Did Illegally Counted Overseas Absentee Ballots Decide the 2000 U.S.
Presidential Election?
 

   Although not widely known until much later, Al Gore received 202 more votes than George W. Bush on election day in Florida. George W. Bush is president because he overcame his election day deficit with overseas absentee ballots that arrived and were counted after election day. After the election, the New York Times conducted a six month long investigation and found that 680 of the overseas absentee ballots were illegally counted, and no partisan, pundit, or academic has publicly disagreed with their assessment. This paper describes the statistical procedures developed and implemented for the Times to ascertain whether disqualifying these 680 votes would have changed the outcome of the election. The methods involve adding formal Bayesian model averaging procedures to King’s (1997) ecological inference model. The authors show how they derived the results for the Times, and also present a variety of new empirical results that delineate the precise conditions under which Al Gore would have been elected president.

 01.14 Statistical Models for Enhancing Cross-Population Comparability
 

   Measuring the health state of individuals is important for the evaluation of health interventions, monitoring individual health progress, and as a critical step in measuring the health of populations. Self-report responses in household survey data are widely used for assessing the non-fatal health status of populations. The object of this document is to elaborate on several statistical models used in the analysis of survey data. First, the paper focuses on off-the-shelf models that are widely available as part of any standard statistical software. In particular, the authors demonstrate the problems of inference that arise from these standard methods when the underlying data are not cross-population comparable. In later sections, the authors introduce methods that modify these standard routines to enhance the cross-population comparability of survey analyses.

 01.18 An Individual-Level Approach to Health Inequality: Child Survival in 50
Countries
 

   Reducing health inequalities is an important part of the agenda of health policymakers globally.  Studies of health inequalities have revealed large variations in average health status across social, economic, and other groups. The authors use an extended beta-binomial model to estimate the distribution of the risk of death in children under the age of two in the 50 developing countries where data from Demographic and Health Survey are available. The authors then find that inequality estimates should be routinely reported alongside average levels of health, as they reveal important information about the distribution of health in populations. Measuring inequality with individual level data, rather than quantifying differences in average levels of health across social groups, enables meaningful comparisons of inequality across countries and analyses of the determinants of inequality, and this approach should be extended to the measurement of inequalities in healthy life expectancy.

 01.19 Measurement Methods for Inequalities in the Risk of Adult Mortality
 

   Unlike with children many fewer datasets are readily available for the estimation of inequality in adult risk of dying. Particularly in low and middle-income countries, information on average levels of mortality is often unreliable, let alone data on the distribution of mortality risk within countries. Intensive efforts have been made to identify datasets that are well suited for the analysis of inequality in mortality risk in adults and models have been modified to fit this analysis. This paper presents an overview of data available, proposes a model to estimate the distribution of mortality risk in adults and an alternative process of approximating that distribution in countries where data are not readily available.