AI rivals radiologists at classifying common hip arthritis—with a few caveats

Researchers have trained a deep learning model to accurately assess one of the most common forms of arthritis by analyzing hip x-rays, according to a new report.

More than 230 million people are impacted by osteoarthritis across the globe, and that figure has only increased in the United States as the population has grown older, authors wrote Feb. 4 in Radiology. After putting their algorithm to the test on more than 4,000 individuals, the model performed well, even rivaling the accuracy of trained radiologists.

The multitask learning approach—a “main future trend” in radiology— can help imaging experts evaluate the numerous individual features of osteoarthritis on radiographs. A task described as “time consuming” and one that requires expertise, Claudio E. von Schacky, with the University of California San Francisco, and colleagues wrote.

Using AI may also lend a hand to inexperienced or untrained readers who often are unable to reliably reproduce accurate assessments, the authors added.

Schacky and colleagues from Germany trained their model on more than 15,000 hip joints, taking five features of hip osteoarthritis from each scan. Two musculoskeletal radiologists read all radiographs, grading them for femoral osteophytes, acetabular osteophytes, and joint-space narrowing, among other features. The 4,368 participants were split into an 80% training group, 10% validation cohort and 10% for testing.

Overall, the deep learning technique reliably assessed the five individual features with similar accuracy as the radiologists. That accuracy, however, did vary according to each feature.

For example, it was 97% accurate at assessing subchondral cyst, but fell to the 76% mark when searching for acetabular osteophytes.

“Overall, these findings indicated that some osteoarthritis imaging features are likely more challenging to evaluate for both deep learning models and radiologists,” the researchers wrote. “However, the lower performance of the deep learning model could potentially be related to less reliable radiology readings for specific features because these served as ground truth.”

More studies are needed to validate these results, but Schacky et al. believe their model can be particularly useful in large epidemiologic studies assessing the features of hip osteoarthritis.