Radiologist bests machine-learning algorithms at diagnosing thyroid cancer

In developing algorithms to differentiate between suspicious nodules in the thyroid gland, researchers in China have found that their machine-learning computations separate malignant from benign properties more accurately than an inexperienced radiologist—but not as accurately as the experienced radiologist whose know-how was used to create the algorithms.

Their research is running in the October edition of the American Journal of Roentgenology.

Dr. Hongxun Wu of Jiangyuan Hospital in the province of Jiangsu and colleagues worked with 970 histopathologically proven thyroid nodules in 970 patients.

They had two radiologists retrospectively review ultrasound images of the nodules, grading them according to a five-tier scoring system.

One of the rads—the one whose clinical interpretations would feed the computations—had 17 years of experience. The other had three.

The team then obtained statistically significant nodule variables based on the experienced radiologist’s observations. From this data they built several algorithmic models for predicting malignancy.  

Using receiver-operating-characteristic curve analysis, the team next compared their algorithms’ performance with that of the radiologists.

Wu and colleagues found that the highly experienced radiologist’s diagnosis topped the field for predictive accuracy. This radiologist achieved 88.66 percent accuracy, with a sensitivity of 91.54 percent and a specificity of 85.33 percent.

The less experienced radiologist had a prediction accuracy of 81.03 percent, with a sensitivity of 85.38 percent and a specificity of 76.0 percent.

The best-performing of the algorithms, a radial basis function neural-network design, achieved the highest sensitivity—92.31 percent—but had lower scores than the experienced rad for predictive accuracy (84.74 percent) as well as specificity (76.0 percent).

The authors note several limitations to their study, including its drawing from a selected preoperative population. This, they write, probably accounts for the study’s relatively high malignancy rate of thyroid nodules (52.3 percent) compared with those in other studies (45 to 51 percent).

In their discussion, Wu et al. reiterate that the primary aim of the study was to design computerized classifier models to aid in diagnosing thyroid nodules.

“The goal of thyroid nodule evaluation is to determine whether a nodule is benign or malignant to choose the most appropriate management. As a cost-effective tool, ultrasound has a high sensitivity for detecting thyroid nodules,” they write.

The authors point out that thyroid cancer presents in varying ways, depending on its histopathologic subtype.

Because of this, predictive values “are extremely variable between studies,” they write. “In addition, variations in radiologists’ perceptions and the lack of standard definitions of the features observed in the images also contribute to variability in thyroid nodule diagnosis.”

The authors add that they have already started work creating a visual interface builder to advance their development of classifier models based on machine-learning algorithms.

“By inputting the ultrasound features given by an experienced radiologist,” they write, “a malignancy risk estimation system for thyroid nodule based on the developed classifier model will provide a real-time calculation of the probability for malignancy, which will play a valuable role for management decision in clinical practice.”