Initiating a Discussion: Challenges and Opportunities for Use of Real World Data


By Sandra Zelman Lewis, PhD, Chief Guidelines Officer, Doctor Evidence

The importance of high quality double-blinded randomized controlled trials (RCTs) is widely recognized as the highest standard for peer-reviewed evidence-based medicine (EBM) reliability in healthcare policy decisions. Stakeholders across the spectrum of EBM use them everyday guidelines, patient care protocols and other decision support tools, regulatory approvals, formularies, and performance measures. Yet there is a well-known corollary that the results of these RCTs cannot be generalized to the typical patient. Among other reasons, RCTs usually focus on targeted high-risk groups with the index condition and exclude patients with multiple comorbidities. But many patients actually have more than one chronic disease or condition. So data from these RCTs cannot be generalized to patients who might have contraindications to recommended treatment because of their other conditions or the prescribed therapies for these other conditions. Some commonly occurring co-morbid conditions (e.g., atrial fibrillation and stroke or diabetes and kidney disease) should be studied as a unit so guidelines can take these clusters and the related concerns into consideration.

A worthy goal would be to investigate the effects of these recommended treatments on actual patient outcomes. Observational studies are a good source of evidence regarding the benefits and especially the harms associated with interventions in a clinical setting and are being used more commonly today for systematic reviews and guidelines. But if we could use large patient-level data sets (de-identified), we could examine real-world data to determine the effects of treatments on outcomes in real-world settings and patients, even the most complex patients, and could provide insights into the benefits and harms of optional interventions for specific patient groups with their defined characteristics.

So with this GROWTH Commentary, I would like to start a discussion (post your comments below) about the possibilities and prospects, as well as the challenges and complexities, of working with real-world data (RWD). RWD is becoming ubiquitous with even the new Apple Watch being a source of healthcare data. I encourage you to comment and share your experiences on these and other relevant topics:

  • Obtaining access to RWD
  • Reliability and validity
  • Data mining methods and practices
  • Using RWD to test whether patterns observed in the published clinical literature are reflected in real healthcare settings
  • Use of RWD to test compliance with guidelines
  • Long-term study of efficacy and safety in specific subgroups
  • Surveillance for off-label uses of treatments
  • Difficulty of merging multiple data sets
  • Please add to this list in your comments.

Please comment, also, on the various sources of RWD, including:

  • EMR data
  • Patient registries
  • Patient surveys and CAHPS data
  • Large epidemiological data sets (eg, NHANES, SEER)
  • Claims data
  • Actuarial tables
  • Autopsy data (eg, in AMP-AD Data Portal)
  • Genomic data
  • Please add to this list in your comments.

The opportunities and challenges can be captured in this table. As you, our readers, add more thoughts and ideas to the postings below, we will update this table with your suggestions.

Data Source Oportunities/Benefits Challenges/ Problems
Patient level (de-identified) EMR data
  • Comparing patterns and trends against those observed in published studies
  • Potentially sufficient volume for subgroup comparisons
  • HIPAA concerns, access barriers
  • Difficult to assess accuracy
Patient level (de-identified) registry data
  • Pattern comparisons
  • Potentially sufficient volume for subgroup comparisons
  • Multi-site sources could provide sufficient data for rare diseases
  • Barriers to access
  • Difficult to assess accuracy
  • Difficult to scrub the data
Personal source data,
eg Apple Watch or FitBit
  • Real-time submissions
  • Difficult to detect errors
CAPHS and patient survey data
  • Patient Perspectives
  • Sampling bias
  • Non-response rates
  • Framing bias
Patient level epidemiology data, 
state disease-specific registries
  • Comprehensive data fields collected
  • Large data sets
  • Good for subgroup analyses
  • Already available in digital format through Doctor Evidence
  • Confined to US
  • Rare diseases are not included
Claims data, 
eg CMS, other insurers
  • Accessible
  • Long-term episode of care follow up
  • Accuracy/quality
  • Difficult to follow same patient across multiple settings
Actuarial tables
  • Can help with predictions of life expectancy, especially useful in screening decisions
  • Not granular enough to apply to specific patient groups
Autopsy data
  • Availability provided but must apply to each data bank separately
  • Limited to patients with Alzheimer’s
  • Complexity of application
Genomic and Proteomic data
  • Future of cancer classifications and maybe in other disease states
  • Barriers to access
  • Not enough known about most mutations, sequences, etc
Other sources

This will become an interactive discussion in which you can both post your own comments and sign up for alerts when others post comments. Let’s keep the discussion going so we can all learn from each other and possibly foster collaborations. That is the GROWTH way!