From Anecdote to Evidence: Using Big Data to Conquer MS, An Interview with Ken Buetow, PhD
As a participant in the patient-driven iConquerMS initiative, established by The Accelerated Cure Project for MS, I am excited about the potential impact of research arising from topics suggested by people with MS who register at iConquerMS.org.
The type of research one reads about online, such as that investigating new disease-modifying treatments for MS, tends to focus on easily measured disease outcomes such as number of lesions, brain atrophy, or disability progression. However, people with MS may be less interested in lesions and more interested in aspects of the disease that affect their day-to-day lives.
MS patients frequently compare anecdotal notes with each other in forums and support groups. But how do you turn several anecdotes into scientific data? You begin by asking targeted questions of people who have MS; lots of questions among lots of people What begins to emerge is often called Big Data.
Big Data has become a popular buzzword in the areas of healthcare and medical research. I've invited Dr. Ken Buetow, the iConquerMS project team's technical expert and a well known researcher, to answer a few questions to help us understand ways that Big Data may be used to better understand multiple sclerosis.
What is meant by Big Data when it comes to medical research?
Big Data has become a hot topic. It is discussed in many venues by diverse communities.The term Big Data is used by various people to mean different things. IBM suggests that Big Data is defined by four characteristics - volume (very large quantities), variety (different forms), velocity (has dimensions of time), and veracity (has variable reliability). John Myles White, a Data Scientist at Facebook, describes Big Data as any "quantity of data that's so large that traditional approaches to data analysis are doomed to failure."
I believe when used in the context of medical research both definitions apply. Big Data in biomedicine refers to large quantities of diverse types of data - clinical care encounters, demographic, geographic, individual clinical experiences, lifestyle, personal preferences, and molecular characterizations from tens of thousands of individuals. What is driving interest in Big Data in medicine is not the challenge faced with "wrangling" it, but, rather, the opportunities associated that open up to us when it has been harnessed. Large volumes of data have embedded novel insights that are not apparent through the examination of smaller quantities of data. With Big Data, one has the possibility of discerning patterns and associations that would be undetectable through traditional approaches.
I'm reminded of the Michaelangelo quote that "every block of stone has a statue inside it, and it is the task of the sculptor to discover it." One might say all Big Data contains insights and it's the job of the data scientist to find them!
Large, next generation companies such as Google and Amazon use big data routinely to find such patterns. In clinical research, it is hoped these patterns will assist in determining who will respond to what interventions and what factors precipitate the onset of disease, as well as to tailor care to individual characteristics, as opposed to an arbitrary population "average." Big Data, we believe, has the capacity to convert 'anecdote to evidence'.
How many people with MS does iConquerMS need to register to be able to conduct meaningful research and turn 'anecdote to evidence'?
Big Data, by definition, is big. Numbers count - the more data one has, the more interesting questions that can be asked -- and answered. More data means it is possible to find patterns associated with smaller segments of the population. Pragmatically, the more data one has, the greater the capacity one has to differentiate anecdotal from valid findings. However, unlike traditional studies where sample size is gauged to the ability to objectively test a single, specific hypothesis, Big Data is opportunistic. What do I mean by that? Because computer-based analysis of data is relatively inexpensive compared to traditional research studies, one can ask a myriad of questions, and dynamically keep evolving the questions. iConquerMS is targeting a population of 20,000 individuals living with MS, and that number should provide substantial capacity to achieve valid study findings.
Considering the type of massive database which iConquerMS participants will create, as opposed to creating a separate database for each individual study as is done in typical randomized controlled clinical research, how will scientists be able to parse the data and look for patterns that begin to answer some of the community's research questions?
Simply put, by embracing the Big Data paradigm, iConquerMSâ„¢ enables research that complements traditional approaches. iConquerMSâ„¢ collects large volumes of diverse data from all who are interested in sharing information. Questions are then asked against the collected data. These questions can come from both the research community and iConquerMSâ„¢ participants. Instead of conducting a new research study for each question - contacting new individuals and collecting new data-- the data previously shared can be queried and evaluated.
Big Data can also use the computer and the availability of large quantities of data to recapitulate what is done in a traditional research protocol. If deemed important, it is possible to select subsets of individuals with specific characteristics and to examine patterns within these subgroups, which may well lead to greater personalization of interventions.
When the questions of interest are tested in Big Data, it is possible to assess the validity of the answers obtained. Big Data uses an expanded set of analytics to assess the soundness of findings. These analytic tools use the data itself to see whether results are reproducible and to indicate the confidence of the finding.
While Randomized Clinical Trials (RCTs) remain the "platinum standard" for generating evidence in biomedicine, it is simply not possible to conduct an RCT for every possible question and every possible personalized combination. Big Data fills a needed evidence gap.
More importantly, Big Data analytics can be used to "mine" large data collections for non-obvious relationships - to discover unexpected, or previously unidentified, relationships. Computers can be harnessed to be tireless interrogators of data. RCTs can be used selectively to verify findings of critical importance.
Data can be a limitless resource - shared with all who have valid questions to ask. Through iConquerMSâ„¢ access to appropriate data is no longer a barrier to asking critical questions in Multiple Sclerosis. Unlike more traditional research datasets which are limited by the narrow questions originally asked and the ability to combine them, in iConquerMSâ„¢ the rich diversity of data supports many combinations of questions. This eliminates the need to collect the same data over and over again. Moreover, the actively engaged participants assure that new questions of interest can be continually added - making iConquerMSâ„¢ a living Big Data resource.
I like that - iConquerMS will be our living Big Data resource to fight against multiple sclerosis. Thank you. Dr. Buetow, for sharing with us the importance of using Big Data to answer some of the mysteries of MS.
Please join me at iConquerMS.org as we begin to ConquerMS in 2015!! Registration is easy; participation is confidential; and your data and identity will always be protected. Follow iConquerMS on Twitter and Facebook, and use the hashtag #iConquerMS to join the conversation.