The goal of this project is to analyze some existing
statistical methods in public health, import relevant ones from other fields,
and develop new ones to facilitate the analysis of available data.
Opportunities appear ripe in at least three areas.
First, this project develops models to assure the consistency
of estimates of epidemiological data within diseases, on measures of incidence,
prevalence, mortality and disability, as well as across diseases.
A statistical model is being added to the
existing approach, which is based solely on deterministic algorithms. Compositional data models also are being employed to ensure logical consistency
across the set of variables measuring deaths across causes.
Second, this study explores logistic regression, the
most commonly used method in epidemiology and much of public health. Although
apparently unknown in the applied literature, when the sample sizes are less
than about 2,000 and there are more zeros than ones, logistic regression is
biased in predictable directions and is correctable.
The bias is large enough to make an important difference in
drawing substantive conclusions. Monte Carlo, analytical, and empirical attacks
on the problem are proposed. The use of more sophisticated models for binary
dependent variables also are being considered, as such models have been shown in
other fields to perform far better than logistic regression.
The final component of the study extends methods for ecological inference, the
estimation of individual-level relationships when only aggregate data are
available, to the types of data and problems common in public health.
The development of this methodology is
intended to improve estimates of health characteristics in sub-groups of
populations; for example, ecological inference may be used to develop
comparisons of health status in urban versus rural populations where these data
are not available directly.