|
Department of Biostatistics Colloquium Series 2007 - 2008 |
ABSTRACT: Understanding the toxicity of particulate matter (PM) and its complex mixture is a top research priority identified by the National Research Council. To estimate the toxicity of PM, a patchwork of large and heterogeneous datasets on health outcomes, PM chemical components, and weather variables is now available on a national scale. The integration of these datasets is key to detecting the health effects associated with imperfectly measured, correlated, and potentially confounded environmental mechanisms.In this talk we review statistical methods for analyses of large spatial-temporal data for identifying health consequences of PM. Specifically we will review methods for : 1) integrating effect estimates across different statistical models for confounding adjustment; 2) decomposing association between exposure and health into different spatio-temporal scales of variation in the data to control for confounding; and 3) estimating the time course of an adverse health outcome after exposure to air pollution. We apply the methods to the largest existing national databases for estimating health effects associated with PM exposure of varying size, and chemical composition.
The ability to make scientific findings reproducible is increasingly important in an area where substantive results are the product of complex statistical computations. We will review tools for making software and data available to the scientific community. The statistical methods reviewed in this talk, although motivated by air pollution studies, can be more generally applied for estimating health consequences of exposures of other chemical mixtures, including cigarette smoke, diesel exhaust, and the human diet.
ABSTRACT: This seminar will be a repeat of the RA Fisher Memorial Lecture presented at the Joint Statistical Meetings on August 1, 2007 in Salt Lake City.There is a growing interest in public health programs targeted at diagnosing chronic diseases earlier. This is especially true in cancer where there are expanding early detection programs in breast, cervical, colorectal, lung, ovarian and prostate cancers. The goal is to diagnose disease in an earlier and more favorable prognostic stage relative to the disease stage diagnosed under usual care. The possibility of finding disease in a more favorable stage may result in enhancing the treatment benefit resulting in a reduction of mortality and an increased cure rate. This seminar will review the main features of the early detection process and the many statistical challenges it presents.
Among the statistical challenges are:
- How to design early detection clinical trials to evaluate potential benefit? Unlike therapeutic trials in which all subjects must have disease, subjects in early detection trials must be free of disease. Furthermore long term follow-up may reduce statistical power.
- Long term follow-up is necessary to observe an adequate number of deaths due to disease in order to carry out an adequate analysis. Yet the early detection technology may change with time and the outcome of a trial may be of limited value. Is it possible to predict the long term mortality without long term follow-up?
- How to select optimal examination schedules? An examination schedule consists of the age to begin special early detection examinations, the interval between exams, the number of exams and possibly the age to end exams. How does one choose examination schedules as a function of risk status?
- It is possible that the early detection exams will diagnose disease that has a small probability of showing clinical symptoms in one's lifetime. Can one calculate the probability of "over diagnosis"?
The lecture will illustrate many of the ideas associated with the early detection of disease process using breast cancer (possible benefit for women under 50 benefit with mammogram exams, comparison of different recommendations in the U. S., U. K. and the Nordic countries) and prostate cancer (over diagnosis).
ABSTRACT: Crucial issues for chronic disease prevention research agenda concern the study designs needed to obtain reliable information on preventive intervention effects and the adequacy of traditional sources of preventive intervention hypotheses. Settings in which intermediate outcome and clinical outcome randomized prevention trial data, and cohort study data, are simultaneously available provide an opportunity to examine the study design issue. The newer types of high-dimensional proteomic data coming available offer an opportunity to invigorate the preventive intervention development enterprise. These approaches will be illustrated using data from the Women's Health Initiative clinical trial and cohort study on postmenopausal hormone therapy and on a low-fat eating pattern. Some recommendations on needed elements of the population science research agenda will conclude the presentation.
ABSTRACT: The study of human disease has been revolutionized by technologies developed through the advent of the human genome project. At the Dana-Farber Cancer Institute, I and my group have been working to develop a series of computational tools and methods and laboratory approaches to address fundamental questions in the analysis of human cancers. This talk will provide an overview of the approaches that we have developed and demonstrate applications to breast, ovarian, and colon cancer as well as in the study of the way in which viruses perturb cellular networks with an emphasis on some of the areas where there are opportunities for developing new statistical methods.
ABSTRACT: For monitoring patients treated for prostate cancer, Prostate Specific Antigen (PSA) is measured periodically after they receive treatment. Increases in PSA are suggestive of recurrence of the cancer and are used in making decisions about possible new treatments. The data from studies of such patients typically consist of longitudinal PSA measurements, censored event times and baseline covariates. Methods for the combined analysis of both longitudinal and survival data have been developed in recent years, with the main emphasis being on modeling and estimation. We analyze data from a prostate cancer study in which the patients are treated with radiation therapy using a joint model. Here we focus on utilizing the model to make individualized prediction of disease progression for censored and alive patients, based on all their available pre-treatment and follow-up data.In this model the longitudinal PSA data follows a non-linear hierarchical mixed model. The clinical recurrences are modeled using a time-dependent proportional hazards model where the time dependent covariates include both the current value and the slope of post-treatment PSA profile. Estimates of the parameters in the model are obtained by the Markov chain Monte Carlo (MCMC) technique. The model is used to give individual predictions of both future PSA values and the predicted probability of recurrence up to four years in the future. An efficient algorithm is developed to give individual predictions for subjects who were not part of the original data from which the model was developed. Thus the model can be used by others remotely through a website portal, to give individual predictions that can be updated as more follow-up data is obtained.
This is joint work with Menggang Yu, Donna Ankerst, Cecile Proust-Lima, Ning Liu, Yongseok Park and Howard Sandler.
ABSTRACT: Almost all of the current nonparametric regression methods such as smoothing splines, generalized additive models and varying coefficients models assume a linear relationship when nonparametric functions are regarded as parameters. In this talk we present a general class of nonlinear nonparametric models that allow nonparametric functions to act nonlinearly. They arise in many fields as either theoretical or empirical models. We propose new estimation methods based on an extension of the Gauss-Newton method to infinite dimensional spaces and the backfitting procedure. We extend the generalized cross validation and the generalized maximum likelihood methods to estimate smoothing parameters. Connections between nonlinear nonparametric models and nonlinear mixed effects models are established. Approximate Bayesian confidence intervals are derived for inference. We will also present a user friendly R function for fitting these models. The methods will be illustrated using two real data examples.
ABSTRACT: A typical microarray study might involve two groups of subjects, Controls and Treatments. Each subject provides material for his or her individual microarray, reporting some large number of genetic expressions at the same time, and yielding an m by n data matrix "X", with m genes and n subjects. We expect the measurements down any one column to be correlated, since genes act in concert, making the rows of X correlated. However the columns, that is the microarrays, are usually assumed to be independent. If, for example, we form two-sample t-statistics for each gene's data, the standard Student's t null hypothesis with n-2 degrees of freedom requires independence, as do familiar techniques such as cross-validation and permutation testing. This talk concerns testing a matrix X for column-wise independence when the rows may be highly correlated. The effect of row-wise correlation is to greatly reduce the power of standard tests. In my main example, row-wise correlation will be shown to reduce the effective sample size from m=20426 to 17.
ABSTRACT: Metabolomics is an emerging area of bioinformatics that uses measurements on metabolites in tissue samples to assess health status, evaluate drug toxicity, and learn about key biochemical pathways. The area has several advantages over proteomics and genomics, but from a statistical perspective it poses five significant problems: error modeling, signal extraction, cross-platform comparisons, inference with large p and small n, and network modeling. This talk focuses on all but the last of these.
|
Click
here for past schedules |
Biostatistics
Webmaster |