Department of Biostatistics
Neurostatistics Working Group

2014 - 2015

Coordinators: Dr. Rebecca Betensky, and Adam Sullivan

Schedule: Wednesdays, 12:30-1:30 p.m.
SPH2, Room 426 (unless otherwise notified)

Contract All | Expand All
Seminar Description
This working group provides a forum for presentation and discussion of completed, ongoing, or planned statistical analyses of neurological data. Such data include, for example, in vivo human brain images (anatomic, functional and spectroscopic magnetic resonance imaging), gene expression studies of human and non-human animal brain tissue (brightfield and immunofluorescence microscopy, DNA microarrays, laser micro-dissection), in vivo micro-dialysis, clinical trials data for a variety of neurologic diseases, and genetic data from family studies. Non-statistical presentations of neurological, psychiatric and technological background material will also be included. Through this seminar, statisticians will gain exposure to the statistical issues that arise in the broad field of neurology and brain imaging psychiatry and to the diverse ongoing research in this area throughout Harvard and the world. A main goal of the seminar is to stimulate statistical interest in neuroscience and neurology and to develop strategies for collaboration within these fields.

October 8

Sharon Xiangwen Xie, Ph.D.*
Associate Professor of Biostatistics at the Hospital of the University of Pennsylvania (HUP), University of Pennsylvania Perelman School of Medicine

Survival Analysis with Uncertain Endpoints Using an Internal Validation Subsample"
ABSTRACT: When a true survival endpoint cannot be assessed for some subjects, an alternative endpoint that measures the true endpoint with error may be collected, which often occurs when obtaining the true endpoint is too invasive or costly. We develop nonparametric and semiparametric estimated likelihood functions that incorporate both uncertain endpoints available for all participants and true endpoints available for only a subset of participants. We propose maximum estimated likelihood estimators of the discrete survival function of time to the true endpoint and of a hazard ratio representing the effect of a binary or continuous covariate assuming a proportional hazards model. We show that the proposed estimators are consistent and asymptotically normal and develop the analytical forms of the variance estimators. Through extensive simulations, we also show that the proposed estimators have little bias compared to the nave estimator, which uses only uncertain endpoints, and are more efficient with moderate missingness compared to the complete-case estimator, which uses only available true endpoints. We illustrate the proposed method by estimating the risk of developing Alzheimer's disease using data from the Alzheimer's Disease Neuroimaging Initiative.

*Joint work with Jarcy Zee.
October 29

Yael Reijmer, Ph.D.
Postdoctoral Researcher, J. Philip Kistler Stroke Research Center, Massachusetts General Hospital

"Brain Network Disruption and Cognitive Impairment in Small Vessel Disease"
ABSTRACT: Small vessel disease (SVD) is an important risk factor for cognitive impairment and dementia. The mechanisms linking SVD to cognitive impairment are not well understood. We hypothesized that multiple small, spatially distributed vascular lesions affect cognition through disruption of brain connectivity. We therefore examined local and global network alterations in patients with SVD and examined the relationship between network efficiency, markers of SVD burden on MRI and PET, and potential clinical consequences.

November 5

Carlos R. Ponce, M.D., Ph.D.
Research Fellow in Neurobiology, Harvard Medical School

"How Do Inferotemporal Cortex Cells Use Different Cortical Inputs for Image Categorization?"
ABSTRACT: Our ability to recognize visual objects, such as faces, is realized by neurons in the inferotemporal cortex (IT). These cells show preferences for individual images and image categories (and are thus selective), and are able to maintain these preferences even if one introduces irrelevant contextual changes (they are tolerant to changes in retinal size, position or viewpoint). To perform these computations, posterior IT neurons (pIT) require feedforward anatomical projections from over a dozen cortical regions, predominantly from area V4 and anterior IT, but also from areas V3 and V2. We do not know why multiple projections to pIT are required. In this study, we are defining the contributions of areas V2, V3 and V4 towards selectivity and tolerance in pIT neurons. By reversibly inactivating these visual regions, we can observe selective changes in response selectivity of IT neurons. We can interpret these changes using multivariate statistical techniques, such as multidimensional scaling, affinity propagation and linear classifiers. Our preliminary findings suggest that these input clusters to IT are concerned with different but overlapping computations.

November 12

Jing Qian, Ph.D.
Assistant Professor, Department of Biostatistics, University of Massachusetts - Amherst

"Thresholding Regression with Covariate Subject to Random Censoring"
ABSTRACT: Censored covariates arise frequently in biomarker assessement in epidemiological studies and in family history studies of disease. While there is a large literature on regression models when the outcome variable is subject to censoring, there is a more limited literature on the treatment of censored covariates, especially for type II censoring. We develop threshold regression approaches for linear regression models with covariate subject to random censoring. Compared with existing methods, the proposed methods are simple but effective as they avoid complicated modeling in dealing with censored covariate values. We study the asymptotic properties of the resultant estimators. In addition to estimating the regression coefficient of the censored covariate, the threshold regression methods can also be used to test whether the effect of the censored covariate is significant. We discuss the choice of optimal threshold which yields the most powerful test. The finite sample performance of the proposed methods are assessed through simulation studies. We also apply the method to a motivation example.

November 19 (joint with Harvard Catalyst | The Harvard Clinical & Translational Science Center Biostatistics Program)

Statistical Issues in the Analysis of Neurological Studies Symposium (RSVP to
Speakers will include James Berry, MD, Rebecca Betensky, PhD, Deborah Blacker, MD, ScD, Tanuja Chitnis, MD, Brian Healy, PhD, Eric Macklin, PhD, Jing Qian, PhD, Ritesh Ramchandani, David Schoenfeld, PhD, Michael Schwarzschild, MD, PhD.

Exploration of Statistical Issues that Arise in the Study of Neurologic Diseases
ABSTRACT: This Harvard Catalyst Biostatistics symposium will explore statistical issues that arise in the study of neurologic diseases. The symposium will begin with motivating clinical background and identification of pressing analytical needs in amyotrophic lateral sclerosis, Alzheimer's disease, multiple sclerosis, and Parkinson's disease. The statistical talks will focus on methods for incorporating and handling causal inference, multiple endpoints, high dimensional biomarker selection, censored covariates, and measurement issues in short-term clinical trials. The symposium is intended for statisticians and neurological disease researchers who have analytical interests.

November 25

Ani Eloyan, Ph.D.
Assistant Professor, Department of Biostatistics, Johns Hopkins University

"Matrix Decomposition Methods for Functional MRI Data"
ABSTRACT: The field of functional neuroimaging is growing very rapidly resulting in a vast amount of data for analysis. Recently, several collections of resting state functional magnetic resonance images from different laboratories have been combined in freely available datasets for analysis including the 1000 Functional Connectomes Project Dataset, ADHD 200 among others. Statistical dimension reduction techniques such as singular value decomposition (SVD), independent component analysis (ICA), etc. are routinely used by practitioners in the field of neuroimaging to analyze complex fMRI data. In this talk, the main dimension reduction approaches for fMRI data are discussed stressing the major issues in the applications and the advantages of the methods depending on the biological question at hand. Extensions of the methods to high dimensional data are presented.

December 10

Rebecca E. Amariglio, Ph.D.
Associate Psychologist, Brigham and Women's Hospital
Instructor in Neurology, Harvard Medical School

"Subject Cognitive Concerns in Preclinical Alzheimer's Disease"
ABSTRACT: Although self-reported cognitive concerns (SCC) have previously been dismissed as a sign of the "worried well", there is emerging evidence to suggest that SCC may herald initial cognitive decrements at the stage of preclinical Alzheimer's disease (AD). Recent work from our own group and others suggests that specific SCC may in fact indicate early awareness prior to objective impairment on standardized tests and may be associated with evidence of early pathology on AD biomarkers and longitudinal decline.

January 6

John Ioannidis, DS.c., M.D.
C. F. Rehnborg Professor in Disease Prevention in the School of Medicine and Professor of Health Research and Policy (Epidemiology) and, by courtesy, of Statistics

"Research Practices and Reproducible Research"
ABSTRACT: The way research is selected for funding, designed, conducted, analyzed, and published can have a substantial impact on the reproducibility of scientific results. Empirical evidence suggests that the efficiency of many currently applied research practices is suboptimal, and there is wide variability across different scientific fields in this regard. This leads to a high prevalence of biased results. Dr. Ioannidis will peruse the current landscape and discuss different possibilities that have been proposed on how to improve the adoption and implementation of research practices that could lead to more reliable, accurate, and translatable results in a reproducible manner.

January 21

Sedeshna Das, Ph.D.
Instructor in Neurology, Harvard Medical School
Assistant in Neuroscience, Massachusetts General Hospital
Affiliate Faculty, Harvard Stem Cell Institute
Associate Director, Massachusetts General Hospital Biomedical Informatics Core

"Linear Models to Predict ΔMMSE"
ABSTRACT: We have developed a statistical linear model to predict change in subject scores on the Mini-Mental Status Exam (MMSE) over time. Our model includes the clinical diagnosis, APOE4 alleles, an interaction between the two, the baseline MMSE score and a few SNPs chosen from literature. This project was done as part of the Alzheimer's Disease Big Data DREAM Challenge 1 whose goal was to predict the change in MMSE at the 24 month follow-up visit given clinical covariates and genotypes from a Genome Wide Association Study (GWAS). The training set consisted of 750 individuals from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, and the test set was from the The Religious Orders Study and Memory and Aging Project at Rush University. Univariate analyses was used to select clinical covariates and SNPs with a significant odds-ratio were chosen. The model with clinical covariates and APOE genotype performed reasonably well whereas SNA data was not informative.

February 4, 10:30-11:30 am Countway Library, Minot Room (Special Event)

Harvard T.H. Chan School of Public Health Office of Human Research Administration - Quality Improvement Program
Guest Speakers: Kristen Bolt, Research Data and Conflict of Interest Officer, Harvard University
Andrew Ross, Information Security Manager, Harvard Chan School
Miguel A. Sanchez, Information Security Specialist, Harvard University
Kimberly Serpico, IRB Review Specialist, Harvard Longwood Medical Area Schools

"Ensuring Data Confidentiality: IRB Considerations and IT Data Security Measures"
ABSTRACT: How are the Harvard Research Data Security Levels determined for protocols? How do you make sure that you are complying with Harvard's policies on data protection? Can you put your sensitive data on a flash drive or transmit it electronically? How can IT assist you with ensuring High Risk Confidential Information is maintained and shared securely? Come get the answers to these questions and many more straight from IT data security officers and IRB administrators. Bring your questions on your specific protocols discuss with IT and IRB staff. Click here to register.

February 4

Tim Clark, Ph.D.
Director of Informatics, MassGeneral Institute for Neurodegenerative Disease
Assistant Professor of Neurology, Harvard Medical School
Director, Massachusetts General Hospital Biomedical Informatics Core
Co-Director, Data and Statistics Core, Massachusetts Alzheimer Disease Research Center

"Reproducibility or robustness? (or something else?)"
ABSTRACT: Significant concern has recently been expressed in the scientific literature about reproducibility of research, reusability of results, and false positives being reported as fact. These concerns are underlined by periodic scandals involving outright fraud, such as the recent scandal of so-called "stimulus-transitioned" stem cells. What is reproducibility and is it a standard to which scientists should aspire? Is there a difference between reproducibility and "robustness"? This talk will probe some of the recent discussion in the literature and reaction to it in the scholarly communications community.

Some Reading material:

Begley, C.G. and Ellis, L.M. (2012) Drug development: Raise standards for preclinical cancer research, Nature, 483, 531-533.

Colquhoun, D. (2014) An investigation of the false discovery rate and the misinterpretation of p-values, Royal Society Open Science, 1.

Ioannidis, J.P.A. (2005) Why Most Published Research Findings Are False, PLoS Med, 2, e124.:

Rekdal, O.B. (2014) Academic urban legends, Social Studies of Science, 44, 638-654.

February 11

Shelley Hurwitz, Ph.D.
Director of Biostatistics, Center for Clinical Investigation, Brigham and Women's Hospital, Harvard Medical School

"Biostatistics and Ethics"
ABSTRACT: In my talk on Biostatistics and Ethics, I will discuss the reputation of statistics, the response by the statistical community, some associations' guidelines for the ethical practice of statistics, and the movement toward reproducibility. Biostatisticians routinely work closely with physicians and scientists and have unique insight into data, often being privy to confidential data. We work in increasingly multidisciplinary teams with potentially divergent ethics codes and sensibilities. In the last decade we've seen a rapid increase in the ability to collect massive amounts of data, with complex structure and often a sensitive nature. These unparalleled advances and opportunities present new ethical concerns for statisticians.

February 25

Journal Club

Moderated by Eyal Y. Kimchi, M.D., Ph.D.; Clinical Fellow in Neurology, Department of Neurology, Massachusetts General Hospital

"Nonparametric statistical testing of EEG- and MEG-data" by Eric Maris and Robert Oostenveld

March 13, Kresge 204 (Special Date)

Katharine Nicholson, M.D.
Clinical Research Fellow in Neurodegenerative Disease, Massachusetts General Hospital

"ALS Clinical Research Trials: Past Reflections Informing New Directions"
ABSTRACT: Katharine Nicholson is a clinical research fellow at the Neurological Clinical Research Institute (NCRI) at Massachusetts General Hospital (MGH). She is a clinician investigator with a focus on clinical outcomes after intervention in people with amyotrophic lateral sclerosis (ALS). Dr. Nicholson's proposed clinical research focuses on survival analysis after gastrostomy tube placement in ALS patients. She is also working with pulmonary and sleep medicine to further understand the appropriate use of non-invasive ventilation and other methods of home monitoring in people with ALS. In addition to these endeavors, she is involved in several multicenter trials, including observational studies looking to identify novel biomarkers of disease and therapeutic trials of stem cell transplantation in ALS.

The objectives of her talk are to review the past and current challenges in ALS clinical trials and to discuss proposed research exploring clinical outcomes after intervention in ALS.

April 15 (joint with Harvard Catalyst | The Harvard Clinical & Translational Science Center Biostatistics Program)

Wei Wang, Ph.D.
Associate Mathematician, Division of Sleep Circadian Disorders, Department of Medicine and Neurology, Brigham and Women's Hospital / Harvard Medical School

"Challenges of Big Data Analysis" - discussion of linked paper
To join the online event:
1. Click here to join the online event.
Or copy and paste the following link to a browser:
2. Click "Join Now".

To join the audio conference only:
To receive a call back, provide your phone number when you join the event, or call the number below and enter the access code.
Call-in toll-free number (US/Canada): 1-877-668-4490
Call-in toll number (US/Canada): 1-408-792-6300
Global call-in numbers:
Toll-free dialing restrictions:
Access code: 710 193 881

April 22

Kristin Linn, Ph.D.
Post Doctoral Fellow, Department of Biostatistics and Epidemiology, University of Pennsylvania

"Multivariate Pattern Analysis and Confounding in Neuroimaging"
ABSTRACT: Neuroimaging studies often quantify disease-related structural brain differences between populations using a multivariate pattern analysis (MVPA) such as the support vector machine (SVM). The SVM is trained to discriminate between groups, and the weights indicate which brain regions jointly drive the discriminative rule. However, classifier training in the presence of confounders may lead to identification of false disease patterns and spurious results. This occurs when classifiers rely heavily on regions that are strongly correlated with the confounders instead of regions that encode subtle disease changes. The imaging literature recommends using parametric models to regress out confounder effects at each brain region before SVM training. We show that this approach does not properly address the issue of confounding in MVPA. Instead, we propose a novel method that incorporates inverse probability weighting (IPW) during classifier training.

April 27 (10:30 - 11:30 am, Kresge 203 - Cancelled)

Sy Han (Steven) Chiou, Ph.D.
Assistant Professor, Department of Mathematics and Statistics, University of Minnesota Duluth

"Joint Scale-Change Model for Recurrent Events and Failure Time"
ABSTRACT: Recurrent event data arise frequently in various fields such as biomedical sciences, public health, engineering, and social sciences. In many instances, the observation of the recurrent event process can be stopped by the occurrence of a correlated failure event, and thus violates the independent censoring assumption required by most conventional statistical methods. A joint scale-change model for the recurrent event process and the failure time that allows the censoring time to be informative about the recurrent event process is proposed. In particular, a shared frailty variable is used to model the association between the two types of outcomes. In contrast to the popular Cox-type joint modeling approaches, the regression parameters in the proposed joint scale-change model have marginal interpretations. Moreover, the proposed approach is robust in the sense that no parametric assumption is imposed on the distribution of the unobserved frailty and that the strong Poisson-type assumption for the recurrent event process is not needed. To estimate the corresponding variances of the estimators, a computationally efficient resampling-based procedure is applied. Simulation studies and an analysis of hospitalization data from the Danish Psychiatric Central Register illustrate the performance of the proposed method.

April 29

Eyal Y. Kimchi, M.D., Ph.D.
Clinical Fellow in Neurology, Department of Neurology, Massachusetts General Hospital

"Developing Models to Determine the Pathophysiology of Delirium "
ABSTRACT: Delirium is an acute and fluctuating disturbance of attention and awareness that is most common in elderly patients. Delirium heralds the possibilities of not only sustained brain dysfunction but also dependence and death. Despite the profound and alarming nature of delirium, treatments are severely limited by an incomplete understanding of its biological basis. We are developing translational animal models to determine the pathophysiology of delirium. Through a combination electrophysiologic and behavioral studies, we have determined the causal significance of several possible clinical risk factors for delirium.

May 6 - Cancelled

Jessica Gronsbell
Doctoral Student, Department of Biostatistics, Harvard University

"Efficient Estimation of the Receiver Operating Characteristic Curve in Semi-Supervised Settings"
ABSTRACT: We consider the evaluation of a binary classifier in a semi-supervised setting in which a small or moderate sized `labeled' dataset is accompanied by a large amount of `unlabeled' data. This setting is directly relevant to many practical applications where the outcome is expensive or time-consuming to collect, but information on the predictors is readily available. Such data is increasingly prevalent with the rise of electronically recorded databases such as electronic medical records (EMR). While supervised estimation procedures make use of only labeled data, it is often of interest whether unlabeled data can improve efficiency. To address this question in the context of model evaluation, we propose semi-supervised (SS) estimators of various prediction performance measures including the receiver operating characteristic (ROC) curve. We make use of the unlabeled data through a two-step procedure. In step I, the labeled data is used to obtain a non-parametrically calibrated estimate of the conditional risk function. In step II, SS estimates of the prediction accuracy measures are constructed based on the estimated conditional risk function along with the unlabeled data. We correct for potential overfitting bias in our SS estimators with cross-validation and develop a perturbation resampling procedure to approximate the distribution of the proposed estimators. Further, we provide asymptotic results that establish that the SS estimators are always more efficient than their supervised counterparts. We validate our proposals via an extensive simulation study as well as a real data analysis of two EMR studies.

May 13 (Kresge 204)

Catherine Lee, Ph.D.
Doctoral Student, Department of Biostatistics, Harvard University

"Methods for Analyzing Left-Truncated Survival Data with Time-Dependent Predictors Collected at Study Entry, Using Alzheimer's Disease Data and the Hypothetical Biomarker Cascade Model"
ABSTRACT: Over the past decade, several biomarkers of Alzheimer's disease (AD) have been identified and well-validated. Relating these biomarkers to time-to-event would allow for the prediction of future clinical course. Implementing such a survival analysis may seem straight-forward (just fit a Cox model with time-varying predictors), however there are several issues that complicate the analysis. One issue is that time-dependent biomarkers of AD may only be measured at study entry which is at odds with the fact that the history of the time-dependent process needs to be available at all observed event times in order to fit a Cox model with time-dependent predictors. A way to get around this is to define the time origin to be study entry so that time-dependent biomarkers collected at study entry can be viewed as fixed baseline predictors representative of a subject's disease state at that time; this simplifies the analysis. However if study entry does not coincide with a biologically relevant event, the definition of time-to-event may have little meaning outside of the study. A choice of time origin such as birth or a milestone corresponding to disease onset may be more appropriate but the analysis is complicated by the fact that left-truncation must be accounted for and there is still the problem that the time-dependent biomarker is only measured at study entry.

May 20

Adam Sullivan
Doctoral Student, Department of Biostatistics, Harvard University

"Longitudinal Mediation with Latent Growth Curves"
ABSTRACT: We consider longitudinal mediation with latent growth curves. We define the direct and indirect effects using counterfactuals and consider the assumptions needed for identifiability of those effects. We develop models with a binary treatment/exposure followed by a model where treatment/exposure changes with time allowing for treatment/exposure-mediator interaction. We thus formalize mediation analysis with latent growth curve models using counterfactuals, makes clear the assumptions and extends these methods to allow for exposure mediator interactions. We present and illustrate the techniques with a study on Multiple Sclerosis(MS) and depression.

Back to SPH Biostatistics Maintained by the Biostatistics Webmaster
Last Update: May 18, 2015