Machine learning edges out manual method in identifying spine findings

Matt O'Connor | March 29, 2018 | Health Imaging | Artificial Intelligence

Low back pain (LBP) is difficult to diagnose and treat, despite plentiful interventions available for the condition. In the U.S., the problem accounts for an estimated $100 billion in annual costs.

A group of U.S. researchers created a natural language processing (NLP) system which outperformed traditional rule-based methods in identifying lumbar spine findings, according to a study published online March 28 in Academic Radiology.

“Although information extraction from these [radiology] reports can be done manually, this technique is impractical for large sample sizes,” wrote corresponding author Jeffrey G. Jarvik, MD, MPH, with the radiology department at the University of Washington in Seattle, and colleagues. “As an alternative to manual extraction, natural language processing (NLP) has been successfully used to harvest specific findings and conditions from unstructured radiology reports with high accuracy.”

A total of 413 x-ray and 458 MR lumbar spine radiology reports were analyzed from four integrated health systems (Kaiser Permanente of Washington, Kaiser Permanente of Northern California, Henry Ford Health System in Detroit and Mayo Clinic Health System in the upper Midwest).

The team of spine disorder specialists used standardized criteria to pick out 26 LBP-related findings. From the overall data set, 80 percent was used for NLP learning and 20 percent for testing the AI platform.

Results were as follows:

Rule-based and the machine-learning platform were comparable with an average specificity of 0.97 and 0.95 respectively.
The AI approach achieved a 0.94 sensitivity compared to a 0.83 for the rule-based method.
The NLP system also scored a higher area under the receiver operating characteristic curve (AUC) at 0.98 compared to a 0.90 for the rule-based method.

“Our NLP system performed well in identifying the 26 lumbar spine findings, as benchmarked by reference-standard annotation by medical experts,” wrote Jarvik et al. “Machine-learned models provided substantial gains in model sensitivity with slight loss of specificity, and overall higher AUC.”

Matt O'Connor

Matt joined Chicago’s TriMed team in 2018 covering all areas of health imaging after two years reporting on the hospital field. He holds a bachelor’s in English from UIC, and enjoys a good cup of coffee and an interesting documentary.

Related Content