Augmented datasets can improve accuracy of neural networks

Deep convolutional neural networks (DCNNs) can better classify chest x-rays when trained on augmented datasets, according to a new study published in Clinical Radiology.

Augmenting data involves processing images to increase the number of samples included in a set. For example, AlexNet, a popular DCNN, was trained on an augmented dataset after applying methods such as horizontal flipping to images for improved classification.

“Image classification performance using DCNNs is dependent on the data available in the training datasets, with large and diverse datasets providing the best results; however, correcting large datasets is usually time and effort intensive,” explained R. Ogawa, with Saiseikai Matsuyama Hospital’s Department of Radiology in Japan, and colleagues. “Furthermore, the availability of medical images to train DCNNs is limited.”

The researchers applied a combined or single augmentation technique—including rotation, horizontal and vertical flipping, Gaussian blur and brightness variation—to grey-scale chest radiographs, determining which augmented DCNN could best detect abnormal chest x-rays.

A total of 288 abnormal and 447 normal x-rays were divided into training and validation sets. That data was augmented to create 12,789 training and validation images.

Results showed the DCNN trained with augmented data was typically more accurate than the algorithm trained on non-augmented data. Rotation combined with horizontal flipping achieved the highest accuracy at 0.91. Gaussian blur, however, limited the DCNN’s accuracy.

Ogawa and colleagues also tested the highest performing DCNN technique on 150 x-rays taken from a publicly available National Institutes of Health dataset. When using that data, the DCNN achieved an accuracy of 83%.

One important limitation of this study was the researcher’s inability to understand the root cause of false positives identified by the DCNN. They argued that a heat map could depict the areas of the image that helped the model produce its results.

“In conclusion, augmentation of training datasets was useful for the binary classification of chest radiographs using a DCNN,” the authors wrote. “Classification performance was highly dependent on the type of augmentation techniques employed.”