2014 Research Project Summaries


A Study of Social Networks in Karnataka India

Student Researchers: Lisbeth Acosta, Jamaris Burns, Emmie Roman Melendez
Faculty Mentor: Dr. Jukka-Pekka Onnela
Graduate Mentor: Patrick Staples

Many applications of statistics use some form of independence or tractable dependence.  However, relationships among individuals can be complex, and require explicit representation.  In recent years, this information has been available in datasets such as the Internet, the World Wide Web, metabolic networks, and interstate power grids.  These networks exhibit characteristics not previously predicted for this data, including wide distribution of connectivity among individuals and densely-connected groups of individuals, or communities.  These properties also have real-world effects on the networks they characterize, such as the spreadability of an infectious disease or the clustering of attributes into social groups.

We examined a survey of 75 villages in Karnataka, a state in southern rural India.  This survey includes several socioeconomic measures of the individuals, as well as social information representing kinship, friendship, or economic trade ties.  In our project, we sought to discover the degree of social segregation within and between villages.  Perhaps individuals within a particular gender, level of education, or stratum of the caste system tend to connect with each other more often than with those who do not share these attributes.  We found that caste membership was strongly predictive of social ties, among other variables that measured social segregation well.

Comparing Subtypes of Breast Cancer Using a Message-Passing Network

Student Researcher: Kamrine Poels
Faculty Mentor: Dr. John Quackenbush

There are four recognized molecular subtypes of breast cancer, each subtype differing from the other three by the expression and regulation of genes. One way to analyze differences between these subtypes is by looking at the regulatory networks. PANDA (Passing Attributes between Networks for Data Assimilation) is a new message-passing network algorithm that uses multiple sources of information to predict regulatory relationships. We used PANDA by assimilating protein-protein interaction, gene expression, and sequence motif data to create subtype specific networks. We took 198 breast cancer gene expression samples from a public database and robustly categorized them into four different molecular subtypes. We ran the samples together according to their subtype in PANDA and compared the final regulatory networks of subtypes Luminal A and Luminal B. First, we looked at the transcription factors with the highest differential targeting between those two networks and found that the majority are more active in the Luminal B subtype. Out of those transcription factors more active in Luminal B, a couple are or interact with proto-oncogenes. Subsequently, we ranked the genes by differential regulation between the two networks, ran a Gene Set Enrichment Analysis, and found that genes involved with extracellular structure and protein transportation are enriched in Luminal B. Previous research has shown that Luminal B exhibits angiogenic properties and is a highly aggressive cancer in contrast to Luminal A, and the results of this research agreed with those previous studies. Comparison among the other breast cancer subtypes and analysis of the gene co-regulation networks are still in progress.

Prenatal Exposure to Maternal Stress and Childhood Wheeze in an Urban Boston Cohort

Student Researchers: Lilyana Gross, Taylor Mahoney, Christine Ulysse
Faculty Mentor: Dr. Brent Coull
Graduate Mentor: Mark Meyer

It has been demonstrated that prenatal exposure to maternal mental and physical wellbeing is a viable predictor of infant health outcomes such as loss of developmental skills, behavioral problems, and reduced IQ. Many maternal indicators of infant health outcomes, such as smoking, education, and stress levels, have been found to induce negative health outcomes in infants. We examined the relationship between prenatal exposure to maternal stress and repeated wheeze in infants, a known predictor of asthma. In an urban Boston cohort of 297 women, the “stress” hormone, Cortisol, was recorded at five points throughout the day and averaged at each point over multiple days, as well as additional pregnancy related identifiers. A follow up study was conducted to identify mothers whose infants had a wheezing fit two or more times in the first two years of life. Running a logistic regression, we determined that mothers with higher levels of stress during the initial rise after awakening and higher levels of stress before sleep had an increased risk of having a child who wheezed in the first two years of life. Additionally, we found that in the subset of mothers with high stress levels, those who had a BMI greater than 30 had a higher probability of having a child who wheezed compared to mothers that had a BMI lower than 30.


Protective Effects of Propranolol in Adults Following Severe Burn Injury: ASafety and Efficacy Trial

Post-Bac Researcher: Avery Yuan
Faculty Mentor: David Schoenfeld

The overall objective of this Phase 2a/b, multicenter, randomized clinical trial is to determine the safety and efficacy of the non-selective beta blocker Propranolol in adult patients following major burn injury. We hypothesize that Propranolol will provide significant benefit to patients at doses that are safe and do not increase risk of adverse infections or non-infectious outcomes given the fact that bock-blocker represents a rational therapeutic intervention in regulating burn-induced responses. To examine the safety, a mixed model negative binomial regression will be used separately to analyze mortality rates, infectious and non-infectious complications. In addition, we will use a piecewise linear random effects model to study the primary endpoint of this study-cardiac rate pressure product.

The Data Coordinating Center (DCC) is located at the Massachusetts General Hospital Biostatistics Center. The responsibilities of DCC include assisting investigators in all aspects of study conduct such as data management, statistical expertise, adverse event reporting and assistance with publication of research findings. Currently, we are in the process of enrolling patients and collecting data. The data managing system that is adapted at MGH DCC was initiated by Dr. Schoenfeld. It provides direct feedback to site coordinators when data are entered. Reports are generated periodically during the time of data collection to maintain the cleanness, consistency, and accuracy of data. We anticipate performing statistical analysis when the data collection and cleaning process is complete.


“Quantifying RNA levels from RTPCR curves: A New Method for Single Cell Analysis”

Student Researchers: Savion Smith, Esther Fevrier, Havell Markus
Faculty Mentor: Dr. Rafael Irizarry
Graduate Student Mentor: Stephanie Chan

With the advent of single cell analysis, we can better understand patterns of genetic
transcripts underlying variations amongst individual cells—which can differ dramatically in size, protein levels, and expressed RNA transcripts. The presence of certain levels of nucleotides within a host of tissues, for example, can provide information on the characteristics of genetic abnormalities, viral diseases, and the proliferation of cancerous mutations. Quantitative real-time PCR (RT-PCR) has become the leading tool for the detection and quantification of gene expression levels through the utilization of a Ct-threshold method, which allows scientists to make inferences on the abundance of a particular sequence of mRNA present in a sample of cells. However we hypothesize limitations in its ability to precisely estimate small amounts of mRNA on a single cellular level. Given a florescence dataset collected and received from the Fluidigm Corporation, our research project is designed to evaluate the sensitivity and specificity of RT-PCR’s Ct-threshold outputs through linear regression analyses. Our results indeed show that in the case of single cells, the Ct-threshold method is inefficient for deducing starting amounts of mRNA in our sample of cells. This therefore leaves room for Biostatisticians to develop newer statistical models derived from RT-PCR curves, in order to provide goodness of fit estimates.