Dr. Onnela received a 2013 NIH Director’s New Innovator Award, which enabled him and his team to crystallize the concept of digital phenotyping, construct the Beiwe research platform, and develop statistical methods and tools that would help turn the collected data into biomedical and clinical insights. Those of us involved in this multifaceted project have received a large number of requests about our approach and the research platform, and we address some of the most frequently asked questions here. In regard to the most frequently asked question: Beiwe is a transliteration of a Nordic goddess of sunlight and mental health. We pronounce it bee-we.
What is digital phenotyping? We define digital phenotyping as the “moment-by-moment quantification of the individual-level human phenotype in situ using data from personal digital devices,” in particular smartphones. Our definition highlights some important aspects of digital phenotyping, such as using existing personal devices rather than introducing additional instrumentation. To truly leverage moment-by-moment data collected in situ, in the wild, one must rely on the use of passive data, i.e., smartphone sensor and usage data (see below).
Why is digital phenotyping better than existing survey-based approaches to social and behavioral phenotyping? Of the many different phenotype classes, behavior has presented special challenges for phenomics, the systematic study of phenotypes on a genome-wide scale, because of its temporal nature and context dependence. The traditional approach to behavioral phenotyping has relied on pen-and-paper surveys, but these self-reported accounts tend to be highly unreliable and subject to various recall biases. Ecological momentary assessment (EMA) is an approach that attempts to survey subjects’ behaviors and experiences in real time in their natural environments. EMA used to be carried out using specialized hardware, such as personal digital assistants, which limited its scalability, but can now be implemented on smartphones. Whereas EMA can certainly provide interesting insights into behavior, as a survey methodology it suffers from the same problems that all surveys do, mainly reliance on subjective accounts of behavior rather than objective measurement of behavior. Another limitation is that frequent surveys require active user engagement, which may be difficult with certain clinical populations—and long-term adherence is also typically low. Frequent surveying can cause respondent fatigue, which may inadvertently constitute an intervention. Finally, our preliminary findings suggest that subjects are less likely to take surveys under two diagonally opposite situations, either when they are doing very well or when they are doing very poorly. This means that survey data tend to be unavailable at times when it has the potential to be most insightful.
What types of phenotypes can be captured using smartphone-based digital phenotyping? In short, we quantify behavioral patterns, sleep specifics, social interactions, physical mobility, gross motor activity, cognitive functioning, and speech production, among others. Digital phenotyping is also compatible with the NIMH RDoC (Research Domain Criteria) research framework for studying mental disorders. As defined by the NIMH, the framework consists of a matrix, where the rows represent specific dimension of function (domains and constructs) and the columns represent areas for study (units of analysis). The five domains of the RDoC matrix are negative valence systems (responses to averse situations), positive valence systems (positive situations), cognitive systems (cognitive processes), systems for social processes (mediating responses to interpersonal settings), and arousal/regulatory systems (generating activation of neural systems). The daily use of smartphones generates a byproduct of rich social and behavioral data, and when complemented with surveys and audio diary entries, these data can address several of the RDoC domains and several units of analysis (self-report, behavior, and physiology).
What is the difference between mobile health (mHealth or m-health) and digital phenotyping? Mobile health is a broad category and can be defined in different ways, but it usually refers to the “delivery of healthcare services via mobile communication devices.” Digital phenotyping, by definition, refers to the collection and analysis of moment-by-moment individual-level human phenotype data in situ, in the wild, using data from personal digital devices, in particular smartphones. The main goal of digital phenotyping is to advance evidence-based research in the biomedical sciences. It can be seen as part of deep phenotyping, extending other approaches to phenotyping and naturally complementing genotyping and genome sequencing efforts.
How much does digital phenotyping cost? How does it compare with the cost of clinical trials or genome sequencing? The digital phenotyping approach is cost-effective and scalable. The total cost is a combination of fixed costs (such as ongoing platform maintenance) and variable costs (such as server cluster uptime). A small pilot study with some tens of subjects with a couple of months of data collection using Beiwe through our not-for-profit Beiwe Service Center might cost around $80 per subject-month (total cost). A large study with a thousand or so subjects with a year of data collection brings the cost down to about $3.50 per subject-month (total cost). A recent study in JAMA Internal Medicine estimated that clinical trials cost a median of $41,117 per patient and $3,562 per patient visit, and approximately 2/3 of the studied trials had duration of 6 months or less. A typical primary end point in an antidepressant trial in patients with major depressive disorder is the change in clinician-administered MADRS total score (range 0–60) from baseline (week 0) to the end of follow up (week 6). In a typical stroke clinical trial, the primary outcome measure is the change in clinician-administered mRS total score (range 0–6) from baseline to the end of follow up (typically day 90). Adding smartphone-based digital phenotyping data collection to trials like these as an exploratory end point to quantify lived experiences, depending on sample size, might add a cost of $50 per patient to the trial. In the future, we anticipate that digital phenotyping will be even more cost-effective. Phenotyping is often contrasted with genotyping. The first sequencing of the whole human genome cost roughly $2.7 billion in 2003, whereas in 2020, research-grade whole genome sequencing costs around $600.
What is the goal of the lab’s research in this area? Our immediate goal is to standardize data collection and on ongoing, we will develop statistical methods for smartphone-based digital phenotyping in research and clinical use. We have developed the Beiwe platform for investigators in the biomedical sciences who are interested in using research-grade passive phone sensor data (e.g., GPS and accelerometer) and phone-usage data (e.g., communication logs and screen activity) in smartphone-based digital phenotyping. Our overall philosophy is to do as much as possible using passively collected data, because this is the only way to run long-term studies without significant problems with subject adherence. A large majority of studies that might benefit from the use of smartphones as a data collection or intervention tool do not fit this definition. This is loosely analogous to the contrast between genotyping and full genome sequencing. In a typical study, the Beiwe front-end generates about 1 billion data points—a volume of data that most studies do not need. We think that the main intellectual challenge in digital phenotyping is now beginning to move from data collection to data analysis, and development of statistical methods for making sense of the collected data is currently our top research priority.
Why does Beiwe collect raw data? In short, research requires research-grade raw data. Software development kits for Android (ResearchStack, etc.) and Apple iOS (ResearchKit, HealthKit, CareKit, etc.) collect processed data summaries rather than raw sensor and phone-usage data. This introduces an opaque layer between the data generating process and data analysis, making it difficult to compare data across devices or pool data across studies because the data summaries are likely different. The use of predefined data summaries results in a loss of information, narrowing down potential use cases of data to those conceived at the time of data collection (e.g., number of steps taken), and as such diminishes the value of data biobanking. Collection and storage of raw data make it possible to compute any summaries of interest at a future date, thus enhancing study replicability and facilitating reanalyses of data. There are downsides to collecting raw data, notably the large volume of data and the difficulty of keeping phone sensors awake, but both of these challenges are manageable. Raw data enable investigators to ask and answer questions they care about and ensure the transparency of data collection and data analysis.
What is the distinction between a smartphone application (app) and a research platform? A smartphone app is simply a software application that runs on a smartphone. It is important to note that the Beiwe app is just one of the three components of the Beiwe platform. The other components are the Beiwe back end and the Beiwe data analysis pipeline. The Beiwe back end makes use of Amazon Web Services (AWS) cloud computing infrastructure and is used to manage studies (e.g., study creation, addition of users, regeneration of passwords) and collect data. For the latter, it uses AWS Elastic Beanstalk, which automatically handles the details of capacity provisioning and load balancing, making it in essence infinitely scalable. The data analysis pipeline performs data preprocessing, checks data quality, transforms data, carries out imputation, and computes summary statistics of interest. The input to the pipeline are raw data collected by Beiwe, and the output is a p x T matrix, one per subject, where the p rows correspond to different daily summary statistics (e.g., total distance traveled obtained from GPS data and total call duration obtained from communication logs) and the T columns correspond to days. In supervised learning, the goal is to find associations between passively collected data and any other type of data (e.g., surveys or clinical data), and in this setting the obtained matrices can be fed into different longitudinal statistical models, such as generalized estimating equations (GEE) or generalized linear mixed models (GLMM), depending on the goals of the analysis. In unsupervised learning, the goal might be to find anomalies in behavioral data or to perform clustering using a range of possible methods.
Is the Beiwe app available on Android and iOS? Our lab has developed both Android and iOS versions of the Beiwe app that connect seamlessly to the same research platform, enabling researchers to recruit individuals with phones using either operating system.
Is the Beiwe app a native app or a web app? What is the difference? Because Beiwe relies on sensor data, both Android and iOS versions of the Beiwe app are native applications rather than web applications. A web app requires only a browser and an internet connection (either a cell signal or Wi-Fi), is easy and inexpensive to develop and maintain, but cannot typically access phone sensor data. A native app works independently of the web and can access phone sensor data, but in general much more difficult and expensive to develop and maintain. Further, because native apps collect raw phone sensor and usage data, they cannot rely on software development kits but instead rely on a codebase that has been developed from scratch for this specific purpose.
Who has developed Beiwe? The Beiwe research platform has been developed by the Onnela Lab at the Harvard T.H. Chan School of Public Health with funding from the National Institutes of Health (NIH). Specifically, the large majority of the development work has been enabled by a 2013 NIH Director’s New Innovator Award to Dr. Onnela. The lab has worked with two software development groups to create the front-end smartphone applications for Android and iOS devices, and also on the development of the back-end data collection system. The lab generated the data analysis methods and data analysis pipeline internally.
Why can various different studies use the same platform? Each study within Beiwe is independent of any other study; each study has its own subjects, its own study coordinators and investigators, and contains its own active and passive data collection features and sampling schedules. The subjects within each study are generated by their Beiwe user IDs (e.g., yixg8437) and temporary passwords. Once subjects have downloaded the app and entered their user IDs and passwords, the system automatically connects them with the right study, which among other things means that the subjects receive the surveys configured for that study and passive data are collected according to the specifications of the study. This includes what data streams are collected, how they are sampled, how frequently uploaded, and whether Wi-Fi or cellular network is used for upload. The flow of information in the studies making use of Beiwe is from the user to the system, which is what is to be expected from a phenotyping platform. It is possible to make use of the Beiwe back end to design a sister component of the Beiwe app for delivering interventions. This is however likely to be study-specific, and falls outside of our main research area. In contrast, phone sensors are what they are, and therefore the most one can do is to collect as much of the available data as possible and try to make the most sense of the collected data.
How about reproducibility and replicability of Beiwe studies? Only 6% of biomedical studies have been found to be completely reproducible (Prinz et al, 2011). From this point of view, we do not need more studies but rather we need more studies that are reproducible. To achieve reproducibility, it is key to focus on both data collection and data analysis. With the Beiwe platform, we attempt to address both of these stages. We started by building a platform that collects research-grade data. The old adage about data analysis captures the sentiment perfectly: garbage in, garbage out. Therefore, our first step was to improve the quality of measurements. Many researchers have advocated the role of better measurement in studies that involve any type of quantification of human behavior, a point that has been made repeatedly and vigorously by Andrew Gellman among others. Beiwe captures all study settings in human readable JSON formatted configuration files, and the platform enables an investigator to export and import these files with a single click. Therefore, an investigator wishing to replicate a previous Beiwe study only needs this one file to collect identical data in an identical manner. Data analysis can be replicated by studying the scripts that are used to analyze the output matrices of the Beiwe platform.
Can I use Beiwe in my own studies? This is a frequently asked question. In short, we very much hope so. Beiwe is currently being used in numerous studies including various Harvard Medical School teaching hospitals. Investigators worldwide have two different ways of using Beiwe in their studies.
- Beiwe Service Center. For more information, see Beiwe Service Center.
- Beiwe Open Source. The Beiwe research platform source code is available to investigators worldwide for free on GitHub under the permissive 3-clause BSD open source license. Under this model, individuals or institutions interested in using Beiwe will set up their own AWS account and then deploy Beiwe using one of two different ways (single serve deployment vs. server cluster deployment). The Beiwe apps, named Beiwe2 for open source users, are available for free on Apple’s App Store and Google’s Play Store. In this model, the investigators using the open source version would naturally be responsible for all expenses.