SIIM: Big data analysis unveils powerful insights

LONG BEACH, CALIF.—Big data is the buzzword at this year’s annual meeting of the Society for Imaging Informatics in Medicine (SIIM), and Katherine P. Andriole, PhD, used the honorary 2014 Dwyer Lecture on May 15 to talk about the future of data management, both the challenges and the potential.

Andriole, director of imaging informatics at Brigham & Women’s Hospital in Boston and professor at Harvard Medical School, broke the challenge of big data down into components, from acquiring and managing data, to storing and then analyzing.

Big data management platforms can provide a major assist. Apache Hadoop is an open-source, Java-based software framework for storage and large-scale processing of data on clusters of commodity hardware, explained Andriole. Using an associated distributed file system module, storing data on a cluster of machines provides very high aggregate bandwidth.

Another project highlighted by Andriole is the Substitutable Medical Apps & Reusable Technology. Big Data Management Platform (SMART). This project, spearheaded by Ken Mandl, MD, MPH, and Zac Kohane, MD, PhD, of Boston Children’s Hospital, aims to provide a standard language and flexible information infrastructure to facilitate innovation, with the ultimate goal of transforming healthcare into a data-driven enterprise. “The idea is to create a healthcare app store,” said Andriole. A SMART-enabled server, she said, would know how to retrieve, reconcile and aggregate data.

Of course, storage is still a hurdle that looms large. “The media has gotten cheaper and cheaper, but we’re growing our information, so storage will always be an issue.”

Other issues that must be reconciled in order to make the most of big data, according to Andriole, include:

  • Data integrity;
  • Data curation;
  • Data normalization;
  • Data analysis;
  • Data visualization;
  • Missing, sparse or unstructured data; and
  • Longitudinal data.

Looking ahead, the future of big data will lean heavily on cloud storage and distributed computing within increased input/output. Andriole also noted the potential for content searches of pixel data, saying we’ve only hit the tip of the iceberg on pixel data.

Data analysis could then be leveraged to write decision support programs and provide evidence-based patient management. One example cited by Andriole of a data analysis project leading to personalization of diagnosis and treatment protocols came from MIT researchers John Guttag, PhD, and Collin Stultz, PhD. They built a computer model to analyze EKGs from heart attack patients that would have normally been discarded. With data mining and machine learning, they were able to analyzed the massive amounts of data to reveal three electrical abnormalities that can identify which patients had a double or triple risk of dying from a second heart attack within one year.

A person could spend their entire lives writing algorithms to analyze data, said Andriole. “It’s going to be a very good time for informatacists and data scientists. We will be king and we will be needed going forward.”

In closing, Andriole compelled attendees to be like Sam Dwyer, III, PhD, the “father of PACS,” for whom the lecture was dedicated. By this she meant working across disciplines, understanding the application environment, engaging with industry and collaborating. But above all, enjoy the journey.