Medical databases across disparate environments often resemble the aftermath of the construction of the Tower of Babel. That is, each database contains information that is accessible only to its users and is gibberish to users outside the immediate system. As the deployment of healthcare has spread beyond a narrowly confined geographic area, the demand for a homogenous access method to heterogeneous databases has increased.
A team of developers from Shanghai, People’s Republic of China, has proposed a method for the integration of different data resources across different information systems or administrative domains to provide a uniform access interface for users using a grid-based model. Their work appeared online before print in the Journal of Digital Imaging.
A grid-based model is a data-independent method of accessing information. A grid-based spatial index is constructed that allocates relevant objects to their position in the grid; then an index is created of the object identifiers with their grid-cell identifiers, allowing for quick access to information.
This schema allows the structure of the index to be created first; then data can be added on an ongoing basis without requiring change to the index structure. In addition, if a common grid is used by disparate data collecting and indexing activities—such as is practiced by dispersed medical databases—these indexes are then merged from their sources.
“So, grid can provide medical applications with architecture for easy and transparent access to distributed heterogeneous resources across different organizations and administrative domains,” the authors wrote.
The developers constructed their grid model employing Open Grid Service Architecture-Data Access and Integration (OGSA-DAI). This is an open-source middleware product that allows data resources, such as relational or extensible markup language (XML) databases, to be accessed via web services. In addition, the software also includes a collection of components for querying, transforming and delivering data in different ways.
“Applications can use the core grid definition section (GDS) components directly to access individual data stores or can use a distributed query processor to coordinate access to multiple database services,” the authors wrote.
To test the performance of their system, the team constructed a grid accessing three disparate databases across a single domain. They then designed a query instance to retrieve all the medical records of a patient given their ID. The team then implemented the same query instance on thee three database management systems, respectively, and compared their response time with that of the joint query on the model system.
They found that a query of 10,000 records resulted in a standard deviation of 0.18 seconds longer in retrieval time for the grid system, while a query of 200,000 records took 1.13 seconds on the grid system, a standard deviation of 0.23 seconds longer than queries on each of the three databases for the same load.
The team noted three issues that should to be taken into account prior to the deployment of their model in the practical medical environment:
- Access and data-transfer methods will need to be optimized to ensure high throughput and fault tolerance.
- Effective connections between original data and dynamic metadata for the grid will be necessary to improve the effectiveness and efficiency of accessing dynamic distributed databases.
- And, new security mechanisms and access control methods should be developed to meet the special security requirements of medical database integration in a grid environment.
However, the grid model that the developers constructed shows great promise to integrate heterogeneous data sources.
“The result shows that the system can provide an effective way to access underlying medical databases with a comparatively stable performance,” the authors wrote.