Skip to main content
Home Technologies Data quality assurance framework for machine learning in healthcare
Data quality assurance framework for machine learning in healthcare

Data quality assurance framework for machine learning in healthcare

Unmet Need

Currently, larger clinical datasets and electronic health records are fraught with unstandardized parameters and invalid datapoints that can cascade in downstream analyses. Such errors have a 92% incidence rate. These errors would be ameliorated with a scalable software that validates electronic health records and clinical data before it is used to inform real-world clinical practices. Thus, there is a need for verification of high-quality real-world data, especially public health data that can be used in clinical trials or to influence medical research.

Technology

Duke inventors have developed a data quality assurance framework for machine learning in healthcare. This is intended to be used by healthcare companies to validate large clinical datasets and prevent small errors or inconsistencies from skewing the conclusions drawn from these data. Specifically, this framework makes sure that all datapoints gathered in large healthcare datasets are plausible in their values, conform to the type of measurement taken, and are complete in the data gathered for each patient. Data gathered across patients by different methodologies are identified and normalized. Then, aberrant datapoints are identified and reviewed by clinicians who check for the final layer of data accuracy. This has been demonstrated on data spanning four datasets (with work an additional dataset in progress), more than 400,000 patients, and two large geographic regions. This working data quality assurance framework identified 22 elements on average per dataset to be transformed or removed, and in one dataset, found 744,761 invalid datapoints, which if not detected would have skewed the conclusion or presented clinicians with a false conclusion.

Advantages

  • Requires minimal human input aside from defining standard parameters and final validation.
  • Identifies datapoints that are incorrectly entered.
  • Groups of patients based on condition and medications, to enhance clinical insight gained.
  • Model metadata and parameters for data formatting are easily changeable between datasets.
  • Identifies invalid datapoints or entries to prevent skewing analytical results.
  • Framework is adaptable and scalable to a broad range of clinical datasets.

Have Questions?

Please contact us or subscribe for more opportunities

Stay in Touch with Us