Machine learning algorithm and method to aggregate and analyze hematology slide scanner image data

Unmet Need

Hematology slide scanners are now widely used in hospitals across the world for blood morphology analysis. In these scanners, a thin blood smear on a microscope slide is imaged with a high-resolution digital microscope. The acquired image is processed to automatically count blood cells, and to identify and display white blood cells (WBC) in a set of categories. A trained technician then visually examines the WBC images to identify subtle morphological abnormalities. This information, together with the blood sample statistics received from the hematology scanner is then interpreted by a clinician who makes a diagnosis. The use of machine learning software can improve the diagnostic process by automating the search and identification of abnormalities. Deep convolutional neural networks (CNN) allow the examination of gigabytes of image data and have been used in the past to find and rapidly and accurately classify white blood cells (WBCs). Labeling WBCs with a particular diagnostic outcome can train a deep CNN to map the cell images directly to the diagnosis of interest (e.g., sepsis, or COVID-19). However, image data needed for training a CNN is distributed among many hospitals across many hematology slide scanner facilities. Given that accurate CNN classification requires the accumulation of large amounts of training data, it is critical to aggregate such image data to form an accurate automated diagnostic model.


Duke researchers have developed a system and method for aggregating and processing image data from a wide distribution of hematology slide scanners for diagnostic purposes. Image data generated by a hematology slide scanner is processed by a machine-learning algorithm to provide diagnostic information to an attendant clinician. The image data is used to update a model that is distributed across a network of scanners in a secure manner that avoids the sharing of sensitive data, while the same software pipeline can also update the way the hematology slide scanner acquires and outputs data to improve system performance. Notably, the invention has found application in the automated diagnosis of the COVID-19 infection.


  • Produces accurate screening results directly from blood image data.
  • Allows for an entirely data-driven analysis of blood using only patient-level information.
  • Allows aggregation of data from multiple hospitals and clinics in a secure and confidential manner.
Microscope image of human blood cells.

Duke File (IDF) Number



  • Horstmeyer, Roarke
  • Cooke, Colin
  • Jiang, Xiaoyin "Xiaoyin Sara"
  • Kim, Kanghyun
  • McCall, Chad "Chad"
  • Pathak, Vinayak
  • Pittman, Patricia
  • Xu, Shiqi

For more information please contact


Pratt School of Engineering