Data of the PO1 Grant
Statistical Informatics for Cancer Research uses many sources and types of data. Below is a small sample of these data sets:
- Medicare. Hospital admission data for the Medicare population, consisting of over 30 million enrollees. This Medicare data includes information on the number of admissions, demographics of patients with and without hospital admissions, and spatial local at the county-level. Questions on this data may be directed to Yun Wang.
- SEER. The Surveillance, Epidemiology and End Results (SEER) Program is an important source of cancer data and statistics in the United States. Started in 1973, the SEER Program collected detailed information on cancer events in five states and two smaller regions. The program has grown and now collects information on 11 states and 3 regions.
- Normative Aging Study. The Normative Aging Study was established in 1963 by the United States Department of Veterans Affairs to study aging in men. Data were collected on participants every 3-5 years through medical exams, neurological tests, and questions on behaviors, education, diet, and other elements of health and lifestyle.
- US EPA AirData. Daily ambient pollution concentrations for criteria pollutants (PM10, PM2.5, O3, NO2, CO) for all available Federal Information Processing Standards (FIPS) throughout the United States. PM mass concentration data are available as 24-hour integrated samples, usually daily, but sometimes on a 1-in-3 or 1-in-6 day sampling schedule. Questions on this data may be directed to Yun Wang.
- NOAA, weather. Daily temperature, dew-point temperature, relative humidity, wind speed, and precipitation data from more than 8,000 monitoring stations. Relative humidity and allied measures are available from a sub-sample of 500 monitoring stations. Questions on this data may be directed to Yun Wang.
- US Census Bureau. The census provides socio-demographic data at the state, county, 5 digits zip-code tabulation area, and the census block level. Questions on this data may be directed to Yun Wang.
- National Land Cover. This dataset provides data on land cover derived from satellite photographs taken in the years 1992 and 2001. The data include the following variables: total space, green space, open water area, proportion of green space, proportion open water.
- North American Regional Reanalysis. Data every 3 hours on the height of the planetary boundary layer (HPBL) in meters, for grid points in MA from 1994 to the end of 2008.
- CDC: Behavioral Risk Factor Surveillance System. Data on smoking, diabetes, and obesity are available on a metropolitan area basis. Questions on this data may be directed to Yun Wang.
- American Housing Survey. Annual county-level data on prevalence of AC, percent of housing within 300m of green space, percent multifamily housing, and percent of housing during the period 1994 to 2002.
- ANRF, Smoke Free Regulations. Data on when smoking bans went into effect in counties and states throughout the United States.
- Lung Cancer GWAS. The lung cancer genome-wide association study (GWAS) is part of the lung cancer susceptibility (LCS) study. For the GWAS, blood samples collected from lung cancer patients and healthy controls were genotyped on Illumina Human610-Quad. The dataset also contains information on environmental exposures such as smoking and diet. For lung cancer patients, clinical data (including staging, histology and treatment) and survival outcomes were also collected.
These data sets have been and continue to be useful in identifying disease trends, evaluating the effect of interventions, and analyzing connections between health and environmental exposures.