Reportability classifier helpful in distinguishing relevant cancer reports

A reportability classifier from an industrial-strength processing pipeline built to extract content from radiology reports can dramatically reduce human effort in identifying relevant reports from a large imaging pool to further investigate cancer, according to a study published online May 22 in the Journal of the American Medical Informatics Association.

Co-authors Dung Nguyen, PhD, and Jon D. Patrick, MSc, PhD, both of the University of Sydney in Australia, sought to develop an automated system for classification of radiology reports, including results from CT, MRI and PET.

Using active learning (AL) solutions, the researchers constructed optimal supervised machine learning models to extract content from radiology reports for use in the Victorian Cancer Registry. A parallel study was conducted by the cancer registry to check the coincidence of radiology reports with pathology reports and hospital records.

Because machine learning systems have exhibited high accuracy in automatic classification of radiology reports, Nguyen and Patrick investigated traditional supervised learning methods like conditional random fields and support vector machines as well as AL approaches to optimize training production and further improve classification performance in their research project. Two pilot sites in Victoria, Australia were involved in the project and were in collaboration with the NSW Central Registry, which has a pilot site in Sydney. Although the cancer registry did not want to overlook any cancer cases from the reports, it accepted sensitivity better than 98 percent and specificity higher than 96 percent.

The cancer cases included in the study were all reports from a year’s data collection by the imaging service. An initial sample of 16,472 reports was taken from one of the pilot sites; 4,784 were assigned to cancer and 11,688 were assigned to non-cancer by the cancer registry. This information was then incrementally delivered to the researchers’ system.

Once the supervised learning methods and AL approaches were implemented, results revealed that the reportability classifier achieved 98.25 percent sensitivity and 96.14 percent specificity on the basis of two machine learning methods and a rule-based postprocessing system on the cancer registry’s held-out test set. A maximum of 92 percent of training data needed for supervised machine learning can be saved by AL.

The classifier is built on a large real-world dataset and can achieve high performance in filtering relevant reports to support the cancer registries,” wrote the authors.