Department of Biostatistics
Neurostatistics Working Group
2014 - 2015
ABSTRACT: When a true survival endpoint cannot be assessed for some subjects, an alternative endpoint that measures the true endpoint with error may be collected, which often occurs when obtaining the true endpoint is too invasive or costly. We develop nonparametric and semiparametric estimated likelihood functions that incorporate both uncertain endpoints available for all participants and true endpoints available for only a subset of participants. We propose maximum estimated likelihood estimators of the discrete survival function of time to the true endpoint and of a hazard ratio representing the effect of a binary or continuous covariate assuming a proportional hazards model. We show that the proposed estimators are consistent and asymptotically normal and develop the analytical forms of the variance estimators. Through extensive simulations, we also show that the proposed estimators have little bias compared to the naïve estimator, which uses only uncertain endpoints, and are more efficient with moderate missingness compared to the complete-case estimator, which uses only available true endpoints. We illustrate the proposed method by estimating the risk of developing Alzheimer's disease using data from the Alzheimer's Disease Neuroimaging Initiative.
*Joint work with Jarcy Zee.
ABSTRACT: Small vessel disease (SVD) is an important risk factor for cognitive impairment and dementia. The mechanisms linking SVD to cognitive impairment are not well understood. We hypothesized that multiple small, spatially distributed vascular lesions affect cognition through disruption of brain connectivity. We therefore examined local and global network alterations in patients with SVD and examined the relationship between network efficiency, markers of SVD burden on MRI and PET, and potential clinical consequences.
ABSTRACT: Our ability to recognize visual objects, such as faces, is realized by neurons in the inferotemporal cortex (IT). These cells show preferences for individual images and image categories (and are thus selective), and are able to maintain these preferences even if one introduces irrelevant contextual changes (they are tolerant to changes in retinal size, position or viewpoint). To perform these computations, posterior IT neurons (pIT) require feedforward anatomical projections from over a dozen cortical regions, predominantly from area V4 and anterior IT, but also from areas V3 and V2. We do not know why multiple projections to pIT are required. In this study, we are defining the contributions of areas V2, V3 and V4 towards selectivity and tolerance in pIT neurons. By reversibly inactivating these visual regions, we can observe selective changes in response selectivity of IT neurons. We can interpret these changes using multivariate statistical techniques, such as multidimensional scaling, affinity propagation and linear classifiers. Our preliminary findings suggest that these input clusters to IT are concerned with different but overlapping computations.
ABSTRACT: Censored covariates arise frequently in biomarker assessement in epidemiological studies and in family history studies of disease. While there is a large literature on regression models when the outcome variable is subject to censoring, there is a more limited literature on the treatment of censored covariates, especially for type II censoring. We develop threshold regression approaches for linear regression models with covariate subject to random censoring. Compared with existing methods, the proposed methods are simple but effective as they avoid complicated modeling in dealing with censored covariate values. We study the asymptotic properties of the resultant estimators. In addition to estimating the regression coefficient of the censored covariate, the threshold regression methods can also be used to test whether the effect of the censored covariate is significant. We discuss the choice of optimal threshold which yields the most powerful test. The finite sample performance of the proposed methods are assessed through simulation studies. We also apply the method to a motivation example.
ABSTRACT: This Harvard Catalyst Biostatistics symposium will explore statistical issues that arise in the study of neurologic diseases. The symposium will begin with motivating clinical background and identification of pressing analytical needs in amyotrophic lateral sclerosis, Alzheimer's disease, multiple sclerosis, and Parkinson's disease. The statistical talks will focus on methods for incorporating and handling causal inference, multiple endpoints, high dimensional biomarker selection, censored covariates, and measurement issues in short-term clinical trials. The symposium is intended for statisticians and neurological disease researchers who have analytical interests.
ABSTRACT: The field of functional neuroimaging is growing very rapidly resulting in a vast amount of data for analysis. Recently, several collections of resting state functional magnetic resonance images from different laboratories have been combined in freely available datasets for analysis including the 1000 Functional Connectomes Project Dataset, ADHD 200 among others. Statistical dimension reduction techniques such as singular value decomposition (SVD), independent component analysis (ICA), etc. are routinely used by practitioners in the field of neuroimaging to analyze complex fMRI data. In this talk, the main dimension reduction approaches for fMRI data are discussed stressing the major issues in the applications and the advantages of the methods depending on the biological question at hand. Extensions of the methods to high dimensional data are presented.
ABSTRACT: Although self-reported cognitive concerns (SCC) have previously been dismissed as a sign of the "worried well", there is emerging evidence to suggest that SCC may herald initial cognitive decrements at the stage of preclinical Alzheimer's disease (AD). Recent work from our own group and others suggests that specific SCC may in fact indicate early awareness prior to objective impairment on standardized tests and may be associated with evidence of early pathology on AD biomarkers and longitudinal decline.
ABSTRACT: The way research is selected for funding, designed, conducted, analyzed, and published can have a substantial impact on the reproducibility of scientific results. Empirical evidence suggests that the efficiency of many currently applied research practices is suboptimal, and there is wide variability across different scientific fields in this regard. This leads to a high prevalence of biased results. Dr. Ioannidis will peruse the current landscape and discuss different possibilities that have been proposed on how to improve the adoption and implementation of research practices that could lead to more reliable, accurate, and translatable results in a reproducible manner.
ABSTRACT: We have developed a statistical linear model to predict change in subject scores on the Mini-Mental Status Exam (MMSE) over time. Our model includes the clinical diagnosis, APOE4 alleles, an interaction between the two, the baseline MMSE score and a few SNPs chosen from literature. This project was done as part of the Alzheimer's Disease Big Data DREAM Challenge 1 whose goal was to predict the change in MMSE at the 24 month follow-up visit given clinical covariates and genotypes from a Genome Wide Association Study (GWAS). The training set consisted of 750 individuals from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, and the test set was from the The Religious Orders Study and Memory and Aging Project at Rush University. Univariate analyses was used to select clinical covariates and SNPs with a significant odds-ratio were chosen. The model with clinical covariates and APOE genotype performed reasonably well whereas SNA data was not informative.
ABSTRACT: How are the Harvard Research Data Security Levels determined for protocols? How do you make sure that you are complying with Harvard's policies on data protection? Can you put your sensitive data on a flash drive or transmit it electronically? How can IT assist you with ensuring High Risk Confidential Information is maintained and shared securely? Come get the answers to these questions and many more straight from IT data security officers and IRB administrators. Bring your questions on your specific protocols discuss with IT and IRB staff. Click here to register.
ABSTRACT: Significant concern has recently been expressed in the scientific literature about reproducibility of research, reusability of results, and false positives being reported as fact. These concerns are underlined by periodic scandals involving outright fraud, such as the recent scandal of so-called "stimulus-transitioned" stem cells. What is reproducibility and is it a standard to which scientists should aspire? Is there a difference between reproducibility and "robustness"? This talk will probe some of the recent discussion in the literature and reaction to it in the scholarly communications community.
Some Reading material:
Begley, C.G. and Ellis, L.M. (2012) Drug development: Raise standards for preclinical cancer research, Nature, 483, 531-533. http://www.nature.com/nature/journal/v483/n7391/full/483531a.html
Colquhoun, D. (2014) An investigation of the false discovery rate and the misinterpretation of p-values, Royal Society Open Science, 1. http://rsos.royalsocietypublishing.org/content/1/3/140216
Ioannidis, J.P.A. (2005) Why Most Published Research Findings Are False, PLoS Med, 2, e124.: http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.0020124
Rekdal, O.B. (2014) Academic urban legends, Social Studies of Science, 44, 638-654. http://sss.sagepub.com/content/44/4/638.full
ABSTRACT: In my talk on Biostatistics and Ethics, I will discuss the reputation of statistics, the response by the statistical community, some associations' guidelines for the ethical practice of statistics, and the movement toward reproducibility. Biostatisticians routinely work closely with physicians and scientists and have unique insight into data, often being privy to confidential data. We work in increasingly multidisciplinary teams with potentially divergent ethics codes and sensibilities. In the last decade we've seen a rapid increase in the ability to collect massive amounts of data, with complex structure and often a sensitive nature. These unparalleled advances and opportunities present new ethical concerns for statisticians.
ABSTRACT: Katharine Nicholson is a clinical research fellow at the Neurological Clinical Research Institute (NCRI) at Massachusetts General Hospital (MGH). She is a clinician investigator with a focus on clinical outcomes after intervention in people with amyotrophic lateral sclerosis (ALS). Dr. Nicholson's proposed clinical research focuses on survival analysis after gastrostomy tube placement in ALS patients. She is also working with pulmonary and sleep medicine to further understand the appropriate use of non-invasive ventilation and other methods of home monitoring in people with ALS. In addition to these endeavors, she is involved in several multicenter trials, including observational studies looking to identify novel biomarkers of disease and therapeutic trials of stem cell transplantation in ALS.
The objectives of her talk are to review the past and current challenges in ALS clinical trials and to discuss proposed research exploring clinical outcomes after intervention in ALS.
To join the online event:
1. Click here to join the online event.
Or copy and paste the following link to a browser:
2. Click "Join Now".
To join the audio conference only:
To receive a call back, provide your phone number when you join the event, or call the number below and enter the access code.
Call-in toll-free number (US/Canada): 1-877-668-4490
Call-in toll number (US/Canada): 1-408-792-6300
Global call-in numbers: https://harvardsph.webex.com/harvardsph/globalcallin.php?serviceType=EC&ED=379333557&tollFree=1
Toll-free dialing restrictions: http://www.webex.com/pdf/tollfree_restrictions.pdf
Access code: 710 193 881
ABSTRACT: Neuroimaging studies often quantify disease-related structural brain differences between populations using a multivariate pattern analysis (MVPA) such as the support vector machine (SVM). The SVM is trained to discriminate between groups, and the weights indicate which brain regions jointly drive the discriminative rule. However, classifier training in the presence of confounders may lead to identification of false disease patterns and spurious results. This occurs when classifiers rely heavily on regions that are strongly correlated with the confounders instead of regions that encode subtle disease changes. The imaging literature recommends using parametric models to regress out confounder effects at each brain region before SVM training. We show that this approach does not properly address the issue of confounding in MVPA. Instead, we propose a novel method that incorporates inverse probability weighting (IPW) during classifier training.
ABSTRACT: Recurrent event data arise frequently in various fields such as biomedical sciences, public health, engineering, and social sciences. In many instances, the observation of the recurrent event process can be stopped by the occurrence of a correlated failure event, and thus violates the independent censoring assumption required by most conventional statistical methods. A joint scale-change model for the recurrent event process and the failure time that allows the censoring time to be informative about the recurrent event process is proposed. In particular, a shared frailty variable is used to model the association between the two types of outcomes. In contrast to the popular Cox-type joint modeling approaches, the regression parameters in the proposed joint scale-change model have marginal interpretations. Moreover, the proposed approach is robust in the sense that no parametric assumption is imposed on the distribution of the unobserved frailty and that the strong Poisson-type assumption for the recurrent event process is not needed. To estimate the corresponding variances of the estimators, a computationally efficient resampling-based procedure is applied. Simulation studies and an analysis of hospitalization data from the Danish Psychiatric Central Register illustrate the performance of the proposed method.
ABSTRACT: Delirium is an acute and fluctuating disturbance of attention and awareness that is most common in elderly patients. Delirium heralds the possibilities of not only sustained brain dysfunction but also dependence and death. Despite the profound and alarming nature of delirium, treatments are severely limited by an incomplete understanding of its biological basis. We are developing translational animal models to determine the pathophysiology of delirium. Through a combination electrophysiologic and behavioral studies, we have determined the causal significance of several possible clinical risk factors for delirium.
ABSTRACT: We consider the evaluation of a binary classifier in a semi-supervised setting in which a small or moderate sized `labeled' dataset is accompanied by a large amount of `unlabeled' data. This setting is directly relevant to many practical applications where the outcome is expensive or time-consuming to collect, but information on the predictors is readily available. Such data is increasingly prevalent with the rise of electronically recorded databases such as electronic medical records (EMR). While supervised estimation procedures make use of only labeled data, it is often of interest whether unlabeled data can improve efficiency. To address this question in the context of model evaluation, we propose semi-supervised (SS) estimators of various prediction performance measures including the receiver operating characteristic (ROC) curve. We make use of the unlabeled data through a two-step procedure. In step I, the labeled data is used to obtain a non-parametrically calibrated estimate of the conditional risk function. In step II, SS estimates of the prediction accuracy measures are constructed based on the estimated conditional risk function along with the unlabeled data. We correct for potential overfitting bias in our SS estimators with cross-validation and develop a perturbation resampling procedure to approximate the distribution of the proposed estimators. Further, we provide asymptotic results that establish that the SS estimators are always more efficient than their supervised counterparts. We validate our proposals via an extensive simulation study as well as a real data analysis of two EMR studies.
ABSTRACT: Over the past decade, several biomarkers of Alzheimer's disease (AD) have been identified and well-validated. Relating these biomarkers to time-to-event would allow for the prediction of future clinical course. Implementing such a survival analysis may seem straight-forward (just fit a Cox model with time-varying predictors), however there are several issues that complicate the analysis. One issue is that time-dependent biomarkers of AD may only be measured at study entry which is at odds with the fact that the history of the time-dependent process needs to be available at all observed event times in order to fit a Cox model with time-dependent predictors. A way to get around this is to define the time origin to be study entry so that time-dependent biomarkers collected at study entry can be viewed as fixed baseline predictors representative of a subject's disease state at that time; this simplifies the analysis. However if study entry does not coincide with a biologically relevant event, the definition of time-to-event may have little meaning outside of the study. A choice of time origin such as birth or a milestone corresponding to disease onset may be more appropriate but the analysis is complicated by the fact that left-truncation must be accounted for and there is still the problem that the time-dependent biomarker is only measured at study entry.
ABSTRACT: We consider longitudinal mediation with latent growth curves. We define the direct and indirect effects using counterfactuals and consider the assumptions needed for identifiability of those effects. We develop models with a binary treatment/exposure followed by a model where treatment/exposure changes with time allowing for treatment/exposure-mediator interaction. We thus formalize mediation analysis with latent growth curve models using counterfactuals, makes clear the assumptions and extends these methods to allow for exposure mediator interactions. We present and illustrate the techniques with a study on Multiple Sclerosis(MS) and depression.
|Back to SPH Biostatistics||
Maintained by the
Last Update: May 18, 2015