The Public Health Disparities Geocoding Project

Welcome to the Public Health Disparities Geocoding Project Monograph

These pages present an introduction to geocoding and using area-based socioeconomic measures with public health surveillance data, based on the work of the Public Health Disparities Geocoding Project at the Harvard T. H Chan School of Public Health, Department of Social and Behavioral Sciences.

• The Executive Summary describes the motivation behind the Public Health Disparities Geocoding Project, and summarizes the methodology, key findings, and recommendations.

• The Introduction provides a more in-depth look at the history of geocoding and area-based measures, the objectives of our project, and our main findings. We include a glimpse of what routine public health surveillance of socioeconomic disparities in health could look like if conducted over a variety of health outcomes over the lifecourse, from birth to death, using a single area-based socioeconomic measure at the census tract level.

• The Publications page is a comprehensive list of the publications of the Public Health Disparities Geocoding Project, and includes pdf copies of all of our published work.

• We also provide a primer on the basics of Geocoding, including descriptions of the many options and services available, and the nitty-gritty details of address cleaning, address formatting, and evaluation of geocoding accuracy.

• In Generating ABSMs we describe the concepts, methods, and measures behind creating area-based socioeconomic measures, including a summary table of the 19 theoretically justified area-based socioeconomic measures we created based on 1990 U.S. Census data (see ABSM Creation Table).

• Under Analytic Methods, we provide details on how to merge geocoded surveillance data with Census derived population denominators and area-based socioeconomic measures. We also present basic epidemiologic methods for generating descriptive statistics, including directly age-standardized incidence rates, incidence rate ratios and rate differences, the relative index of inequality, and population attributable fraction. Examples are provided for each of these techniques, and each section is further detailed in our comprehensive Case Example.

• We’ve also included some information about Multi-level Modeling and Visual Display of data for surveillance reporting.

• The Case Example is an opportunity for programmers and data managers to try out the techniques we describe on a test dataset, drawn from all-cause mortality cases in Suffolk County, MA, from 1989 to 1991. We provide test datasets, a step-by-step description of the programming tasks, sample SAS code, and examples of the resulting output.

• Finally, to facilitate further research on socioeconomic gradients in health with respect to our recommended area-based socioeconomic measure (CT poverty), we have made available Census Tract Level Poverty Data for ALL census tracts in the United States, for 1980, 1990, and 2000.