MIT publishes dataset of 350K chest x-rays to help develop AI models

The Massachusetts Institute of Technology (MIT)’s Laboratory for Computational Physiology has published their MIMIC-Chest X-Ray Database (MIMIC-CXR)—a collection of more than 350,000 chest x-rays associated with 227,943 imaging studies. The images contain both frontal and lateral views and were sourced from Beth Israel Deaconess Medical Center in Boston between 2011 and 2016.

MIMIC-CXR was designed to help academic, clinical and industrial investigators more easily develop AI algorithms that can detect 14 of the most common illnesses from chest radiographs, such as pneumonia, an enlarged heart or a punctured lung.

The project could also assist in the development of AI models focused on physicians working in underfunded or understaffed hospitals.  

Researchers from Stanford University’s Machine Learning Group published a similar database called CheXpert and a partnering AI model competition in January.  

MIT and Stanford collaborated to ensure that both the MIMIC-CXR and CheXpert projects could be used with minimal legwork for the interested researcher, according to an MIT news release.  

“With single center studies, you’re never sure if what you’ve found is true of everyone, or a consequence of the type of patients the hospital sees, or the way it gives its care,” co-developer of the dataset Alistiar E. W. Johnson, a research scientist at MIT, said in the release. “By working with Stanford, we’ve essentially empowered researchers around the world to run their own multicenter trials without having to spend the millions of dollars that typically costs.”

Johnson and colleagues hope to link MIMIC-CXR with their previously developed dataset MIMIC-III to form an even larger database that includes both patient ICU data and images.  

Researchers will be able to access MIMIC-CXR by completing a training course on managing human subjects and then agreeing to cite the dataset in their published work, according to MIT. MIMIC-CXR was funded by Philips Research.