SIIM: Big data about complexity, not just size

LONG BEACH, CALIF.— While the numbers describing the amount of data being generated in healthcare are eye-popping, for Eliot Siegel, MD, of the University of Maryland School of Medicine and VA Maryland Health Care System, a highly dimensional and complex nature, not just volume, is what epitomizes “big data.”

Speaking on May 16 at the annual meeting of the Society for Imaging Informatics in Medicine (SIIM), Siegel suggested that imaging might have the tallest order when it comes to taming big data. Due to a huge variety of parameters, imaging data is more complex than even genomics. Even if one does look only at volume alone, imaging is a huge challenge, making up at least 90 percent of storage in healthcare systems.

There are hidden insights in the pixel data of every image, said Siegel, who likened it to dark matter, which cannot be directly observed with current technology despite being hypothesized to make up more of the universe than ordinary, observable matter.

For example, a CT pulmonary angiogram theoretically contains thousands of typically unreported parameters besides diagnosing pulmonary embolism, including measuring bone density, calcium score, and cardiac chamber size.

“When I’m talking about big data, I’m not just talking about the number of bytes and bits, I’m talking really about the incredible complexity associated with the image,” said Siegel.

He also made the distinction between “little questions” and “big questions” that can be asked of data. Little questions consist of things like turnaround time or figuring out who is the most prolific referring physician to a given practice. Big questions, on the other hand, require big data and have a broader scope such as asking what impact CT pulmonary angiography has on patient mortality and morbidity or whether CMS should reimburse CT screening exams for smokers over 55.

To get at these big questions, radiology will need to evolve past traditional statistical techniques like multiple regression formulas and correlations that are most often used in clinical trials. These will not be applicable when looking at high dimensional datasets, and techniques such as latent variable analysis, principle component analysis, cluster analysis and other data mining methods will need to come to imaging to achieve the goal of truly data-driven practice, said Siegel.

“I think we’re going to be seeing a new science, and new approaches, to high dimensional datasets that we have not applied.”