Initiating a Discussion: Challenges and Opportunities for Use of Real World Data

By Sandra Zelman Lewis, PhD, Chief Guidelines Officer, Doctor Evidence

The importance of high quality double-blinded randomized controlled trials (RCTs) is widely recognized as the highest standard for peer-reviewed evidence-based medicine (EBM) reliability in healthcare policy decisions. Stakeholders across the spectrum of EBM use them everyday guidelines, patient care protocols and other decision support tools, regulatory approvals, formularies, and performance measures. Yet there is a well-known corollary that the results of these RCTs cannot be generalized to the typical patient. Among other reasons, RCTs usually focus on targeted high-risk groups with the index condition and exclude patients with multiple comorbidities. But many patients actually have more than one chronic disease or condition. So data from these RCTs cannot be generalized to patients who might have contraindications to recommended treatment because of their other conditions or the prescribed therapies for these other conditions. Some commonly occurring co-morbid conditions (e.g., atrial fibrillation and stroke or diabetes and kidney disease) should be studied as a unit so guidelines can take these clusters and the related concerns into consideration.

A worthy goal would be to investigate the effects of these recommended treatments on actual patient outcomes. Observational studies are a good source of evidence regarding the benefits and especially the harms associated with interventions in a clinical setting and are being used more commonly today for systematic reviews and guidelines. But if we could use large patient-level data sets (de-identified), we could examine real-world data to determine the effects of treatments on outcomes in real-world settings and patients, even the most complex patients, and could provide insights into the benefits and harms of optional interventions for specific patient groups with their defined characteristics.

So with this GROWTH Commentary, I would like to start a discussion (post your comments below) about the possibilities and prospects, as well as the challenges and complexities, of working with real-world data (RWD). RWD is becoming ubiquitous with even the new Apple Watch being a source of healthcare data. I encourage you to comment and share your experiences on these and other relevant topics:

Obtaining access to RWD
Reliability and validity
Data mining methods and practices
Using RWD to test whether patterns observed in the published clinical literature are reflected in real healthcare settings
Use of RWD to test compliance with guidelines
Long-term study of efficacy and safety in specific subgroups
Surveillance for off-label uses of treatments
Difficulty of merging multiple data sets
Please add to this list in your comments.

Please comment, also, on the various sources of RWD, including:

EMR data
Patient registries
Patient surveys and CAHPS data
Large epidemiological data sets (eg, NHANES, SEER)
Claims data
Actuarial tables
Autopsy data (eg, in AMP-AD Data Portal)
Genomic data
Please add to this list in your comments.

The opportunities and challenges can be captured in this table. As you, our readers, add more thoughts and ideas to the postings below, we will update this table with your suggestions.

Data Source	Oportunities/Benefits	Challenges/ Problems
Patient level (de-identified) EMR data	Comparing patterns and trends against those observed in published studies Potentially sufficient volume for subgroup comparisons	HIPAA concerns, access barriers Difficult to assess accuracy
Patient level (de-identified) registry data	Pattern comparisons Potentially sufficient volume for subgroup comparisons Multi-site sources could provide sufficient data for rare diseases	Barriers to access Difficult to assess accuracy Difficult to scrub the data
Personal source data, eg Apple Watch or FitBit	Real-time submissions	Difficult to detect errors
CAPHS and patient survey data	Patient Perspectives	Sampling bias Non-response rates Framing bias
Patient level epidemiology data, eg NHANES, SEER, state disease-specific registries	Comprehensive data fields collected Large data sets Good for subgroup analyses Already available in digital format through Doctor Evidence	Confined to US Rare diseases are not included
Claims data, eg CMS, other insurers	Accessible Long-term episode of care follow up	Accuracy/quality Difficult to follow same patient across multiple settings
Actuarial tables	Can help with predictions of life expectancy, especially useful in screening decisions	Not granular enough to apply to specific patient groups
Autopsy data	Availability provided but must apply to each data bank separately	Limited to patients with Alzheimer’s Complexity of application
Genomic and Proteomic data	Future of cancer classifications and maybe in other disease states	Barriers to access Not enough known about most mutations, sequences, etc
Other sources

This will become an interactive discussion in which you can both post your own comments and sign up for alerts when others post comments. Let’s keep the discussion going so we can all learn from each other and possibly foster collaborations. That is the GROWTH way!