Natural language processing tool data mines unstructured neuroradiology reports
Diagnostic radiology generates a lot of data. Although the practice’s images can be parsed and relevant information can be acquired via the utilization of DICOM tools, free-text-generated radiology reports have long languished in silos of unconnected data. A natural language processing (NLP) application developed at Massachusetts General Hospital (MGH) in Boston holds the promise of allowing researchers the capability to data mine unstructured radiology reports.

Pragya Dang and colleagues at MGH’s department of radiology recently put the software, Leximer, through its paces, classifying unstructured electronic neuroradiology reports. Dang presented the results of her group’s study last month in Chicago at the 93rd scientific assembly and annual meeting of the Radiological Society of North America (RSNA).

“We validated the accuracy of the NLP program and an online analytic processing engine (OLAP) in classifying unstructured electronic neuroradiology reports with relevant neurology findings (RNF) according to pathology of the RNF,” she said. “In addition, trends of different pathology findings in over 200,000 neuroradiology reports were analyzed.”

Dang and her team selected 120 neuroradiology reports generated between 2005 and 2006, with and without RNF, from an unstructured report database. The reports were then randomized and independently classified by a radiologist at the institution and by Leximer on the basis of the RNF pathology.

The group then assessed 276,105 neuroradiology reports generated between 1995 and 2006, with RNF, to determine the trends for different pathology findings, Dang said.

“Statistical analysis using appropriate statistical tests was performed to determine the specificity, sensitivity, accuracy, positive and negative predictive values of NLP program for classifying the neuroradiology reports with SNF according to their pathology,” she said.

Dang reported that the researcher’s analysis found that the Leximer application was accurate for classification of pathology findings in RNF with a sensitivity of 99 percent, a specificity of 76.5 percent, and an accuracy of 95.8 percent. In addition, the software demonstrated a positive predictive value of 96.2 percent and a negative predictive value of 92.8 percent.

Their data mining of the 276,105 neuroradiology reports found that vascular pathologies were the most frequently observed pathology findings in about 75 percent of the cases, which included detailed pathologies such as hemorrhage. They were further able to sub-divide this class into acute, sub-acute, chronic, extravascular collection and generic hemorrhage. They also were able to classify the vascular pathologies as dilation, infarction, obstructive process and ischemia—which was further subdivided into acute and generic.

“NLP also classified the reports by other specific pathologies including atrophy, neoplasms, infection, hydrocephalus, meningioma, microadenoma and multiple sclerosis,” Dang said.

The application has the potential to identify disease trends that may assist in focusing the development of preventative education as well as the allocation of healthcare resources.

“Data mining with NLP program and OLAP can help in the classification of pathology findings, which can be used for assessing clinical referral trends and research,” Dang noted.