Small-scale study shows excellent reader consistency assessing BI-RADS breast density

It’s been two years since ACR released the fifth edition of the BI-RADS atlas. How consistently is the big book’s breast-density rating scale being used by individual radiologists and by radiologists compared against one another?

Quite consistently, if a small study by a team of researchers from Australia and Nigeria is a reasonably accurate indicator.

Their findings are running in the May edition of the American Journal of Roentgenology.

Lead author Ernest U. Ekpo of the University of Sydney and the University of Calabar in Nigeria and colleagues analyzed 1,000 breast-density assessments as conducted by a group of five radiologists.

The rads had recently undergone retraining in mammographic breast density classification based on the fifth edition of BI-RADS.

The readers assigned BI-RADS breast density categories from A to D on 100 cases, then repeated the assessment one month later.

The study team used a weighted kappa calculation, which takes into account agreements occurring by chance, to gauge both intrareader and interreader agreement.

They found that agreement by radiologists with their own prior assessments ranged from a kappa value of 0.86 to 0.89 on a four-category scale (A to D) and from 0.89 to 0.94 on a two-category scale (A to B vs. C to D).

More impressive still—given the change in sets of eyes—interreader agreement ranged from substantial (0.76) to almost perfect (0.87) on a four-category scale, with an overall weighted kappa value of 0.79.

On a two-category scale, interreader agreement ranged from a kappa value of 0.85 to 0.91, and the overall weighted kappa was 0.88.

Ekpo et al. acknowledge as major limitations the small number of readers and the limited generalizability of the results due to readers training and working in close physical proximity.

“Conversely, the strength of the present study was its large sample size, which, to our knowledge, is the largest used to date for the assessment of interreader agreement,” they write, adding that the assessment of intrareader agreement “revealed the consistency of the assessment of each reader and added to the strengths of the present study.”

In conclusion, they write, the discrepancies they found in BI-RADS classification on a four-category scale are considerable, but the differences are encouragingly minimal on a two-category scale used in clinical decision-making.