AI challenges radiologists at recommending thyroid nodule biopsies

A deep learning algorithm trained to recommend thyroid nodule biopsy performed similarly to trained radiologists, according to a new study published in Radiology.

The platform, however, did improve the specificity of such recommendations, beating seven of nine radiologists. With more research, the algorithm could help in the decision-making process for assessing thyroid nodules, suggested Mateusz Buda, Duke University School of Medicine’s Department of Radiology and colleagues.

“In our study, we show that deep learning maintains or provides improvement in specificity compared with radiologists who use ACR TI-RADS, which suggests that the proposed algorithm offers performance markedly higher than radiologists who do not use ACR TI-RADS,” Buda et al. explained.

The ACR Thyroid Imaging Reporting and Data System (ACR TI-RADS) was published in 2017 to help radiologist improve consistency when managing thyroid nodules on ultrasound (US). Many studies have found it has done just that, but high interobserver variability and a labor intensive process when evaluating multiple nodules has hindered its adoption.

Buda and colleagues retrospectively reviewed 1,230 patients who were referred for US with subsequent fine-needle aspiration from August 2006 to May 2010. In total, 1,377 thyroid nodules were included along with conclusive cytologic or histologic diagnoses.

The algorithm was evaluated using 10-fold cross-validation followed by internal validation on an independent set of 99 consecutive nodules. Results achieved by the algorithm were compared to three ACR TI-RADS committee experts and nine radiologists with experience interpreting thyroid US images.

For identifying malignant versus benign nodules, deep learning recorded an area under the receiver operating characteristic curve (AUC) of 0.87, comparable to that of the committee experts (0.91) and the mean score for nine radiologists (0.82).

When tasked with recommending biopsy for thyroid nodules, deep learning achieved a 52% specificity and 87% sensitivity, similar to the experts’ 51% specificity and 87% sensitivity, but higher than seven of the nine radiologists whose mean specificity and sensitivity was 48% and 83%, respectively.

According to the researchers, their algorithm could impact clinical practice in two specific areas. One, they wrote, would be its ability to provide the same prediction for a given image, eliminating “substantial” interreader variability seen among radiologists using  ACR TI-RADS. Second, “the algorithm could reduce the time required for interpretation of thyroid nodules, which puts some strain on radiology departments.”