Neural network stages knee osteoarthritis on x-rays better than MSK radiologists

An automated model can stage the severity of knee osteoarthritis from x-rays just as accurately as radiologists. And experts believe it may be a boon for clinicians and patients alike.

To obtain their results, researchers from the U.S. and the Netherlands trained their neural network platform on more than 32,000 images, publicizing the process March 18 in Radiology: Artificial Intelligence. When they pitted the new tech against top musculoskeletal imaging physicians, the former came out on top.

Kevin A. Thomas, MD, PhD, with Stanford University, and colleagues see enormous potential for their approach, ranging from more efficient knee osteoarthritis clinical trials to improved patient care.

“A time- and cost-efficient approach would accelerate clinical trials, which are often slowed by their reliance on experts to first screen large populations with radiography to identify patients with the appropriate level of OA severity to be included,” the authors added. “In clinical practice, having a consistent, automated mechanism for evaluating sequential radiographic examinations of individual patients would enable better tracking of their disease progression.”

Achieving even a fraction of these goals would be an accomplishment, given that knee osteoarthritis is a widespread cause of disability in older adults. The authors noted that some 20% of patients who opt for joint replacement surgery to relieve their symptoms are dissatisfied with the results.

Given these statistics, Thomas and colleagues trained their model on 32,116 radiographs taken from the Osteoarthritis Initiative. These were previously graded by a group of experts according to the commonly used Kellgren-Lawrence system. They further tuned the network using 4,074 images and evaluated it on another 4,090 exams.

After that was completed, Thomas et al. had the AI model and two MSK radiologists interpret a subset of 50 images. Results of the competition showed that the AI edged out both physicians. Specifically, the latter notched an F1 and accuracy score of 0.60. Meanwhile, the model recorded an accuracy of 0.66 and F1 score of 0.64.

“These findings suggest that our algorithm approaches the upper bound of possible performance of an experienced radiologist,” the authors wrote. “The model could empower individual radiologists to achieve committee-quality evaluation by providing a second assessment, thereby reducing the noise in KL scores,” they added later.

What’s more, the investigators said incorporating the model into research or clinical workflows would not add a lot of extra time or labor.

They did note, however, that more validation studies are needed to test the model on OA images from other institutions before it can be used to guide clinical decision-making.

Read the entire study here and download the automated neural network for free here.