By Gillian Law
15 December 2005

The extensibility and scope of OGSA-DAI made it the ideal tool when researchers began looking at better ways to manage the massive amounts of information created by the UK National Health Service's breast screening programme.

The breast screening programme run by the NHS is a highly successful operation, with thousands of lives saved each year by the early detection of cancer.

However the screening process produces thousands of analogue x-rays per clinic, and these are stored across the country with no centralised system for keeping or analysing the information they contain.

Inspection of a mammograph.
Figure: Inspection of a mammograph.

eDiaMoND, a project run jointly by Oxford University, IBM and CTI Mirada Solutions set out to look at this problem, and see if screening data could be collected and shared across clinics, allowing greater workflow sharing, and improving education about how to detect different forms of cancer.

X-rays were scanned to create digital copies, each about 32MB in size, and patient data was carefully anonymised to avoid any privacy issues. Then the team set about finding a way to share the data.

Close up of a mammograph
Figure: close up of a mammograph.

Each participating clinic partner was set up with an IBM P Series 630 server, and linked to the others using Globus Toolkit 3.02 middleware, OGSA-DAI and Apache Tomcat web application server.

To handle the large file sizes of the images, the team decided to use IBM's DB2 Content Manager for Multiplatforms middleware. The indexing data was stored in a separate relational database, allowing it to be searched separately from the image data, reducing the bandwidth used at each search.

However, the separate databases needed to be connected in such a way that they could be accessed simply and easily without users needing to know where the data was coming from.

For that, the team used OGSA-DAI.

"We used OGSA-DAI for the data virtualisation, essentially to abstract the location of the data," said Alan Knox, Advisory Software Engineer, Emerging Technology Services, IBM.

"We had IBM DB2 relational databases with the index data, and the images in Content Manager, and above both, we put OGSA-DAI. We set up multiple instances of OGSA-DAI at each hospital, pointing to the local versions of both sets of data, and to the remote versions," he said.

"And what we then did was use the OGSA-DAI registry to index which of those services we want. Some logic in the grid application can direct the registry. So, it says, for instance, 'just use the local data'. And the registry comes back with the handles of the local services - or it may decide to use the grid data. Because it's OGSA-DAI, the actual API (application program interface) that the client talks to is identical in each case, so the client doesn't need to know whether the data is local or grid-wide," said Knox.

Manfred Oevers, an IBM IT specialist, used the extension mechanism built into OGSA-DAI to enable access to the Content Manager-stored data.

"The DB2 work was a piece of cake," he says, "because we could just take OGSA-DAI and expose those databases as grid services, and then do whatever we liked with them. And so we wanted to do the same with Content Manager. But the problem with Content Manager is that you don't talk directly to the database. There's an API that makes sure all sorts of things are correct, so that you don't have to understand the structure of the database - but that makes it more difficult to do what we wanted. So I knew that OGSA-DAI had this extensibility mechanism, and I thought maybe I could use that," Oevers said.

"And it worked. Because, ultimately, Content Manager is a database, certain CM programming concepts map one-to-one to OGSA-DAI extensibility point, and so we managed to create what we wanted.

"We really liked working with OGSA-DAI, it allowed us to have a nice abstraction layer on top of the underlying back end systems. And in theory you could use other databases, too, from Oracle, say, so it's easy to extend," Oevers said.

The second step, Oevers said, was to create a layer of 'business' grid services on top, with OGSA-DAI as the middle abstraction layer.

"We wanted to make things simple for the users. You don't want users to have to understand how to write SQL queries for the database - so we put another layer on top that would expose just these 'business' queries to the user. By which I mean, there was a standard set of queries they could make - name, records, most recent images, and so on. They don't need access to all possible queries, so we created this simple level on top," he said.

More detailed information was, of course, still available to those who needed it. "For example, the doctors might just want to look at images, and so they'll look at the 'business' level, but if people want to do epidemiological studies, then they can go a level deeper and talk to the OGSA-DAI services directly, and so on," Oevers said.

"The key thing about OGSA-DAI was the ability to extend it, so that we could extract data from the system that OGSA-DAI wasn't originally built to access," he says.

OGSA-DAI is also relatively easy to work with said David Power, Post Doctoral Research Officer at Oxford University. "It's very well documented. In fact, to be honest, it was easier to install GT3 (Global Toolkit 3) using OGSA-DAI rather than GT3s documentation! OGSA-DAI is pretty clear, and easy to set up, as far as grid services go," he said.

A new project, GIMI (Generic Infrastructure for Medical Informatics) is now developing on the work done by eDiaMoND, Power said.

"The old eDiaMoND archive of several thousand mammograms lives on, and so GIMI is using the OGSA-DAI layer to talk to that," he says. "OGSA-DAI is allowing us to maintain what we've already got, instead of having to rewrite everything," he said.

Another benefit to using OGSA-DAI is how it simplifies the development process, said Knox.

"The idea of the registry really helps you from the development point of view. Because what you can do is set up OGSA-DAI across machines. For example, I would be working on the relational database, sitting on my machine, and Manfred would be running the Content Manager server on his, as he worked on it. And by having a registry on a third machine, pointing to our two, we could run tests.

"Likewise, further on in the process, you can point the registry at newer, test versions of the application and then back at the operational version, and diagnose where the problems lie.

"The big benefit of that is you can build the thing without having to deploy lots of software and then find it doesn't work. You can build it on distributed machines and make sure you've got the environment right," Knox said.

"There's a bit of a learning curve in getting to understand how all the services fit together, but once you've got to grips with that, OGSA-DAI is extremely flexible in the way that it wraps resources," Knox said.

"You soon start to reap the benefits of the time you spent climbing that learning curve - you get a lot of flexibility in what you can do."

The two-year EDiaMoND project was funded in large part by IBM and Mirada Solutions, a developer of quantitative image analysis products, who between them paid 50 per cent of the costs. The remainder of the funding came from the EPSRC (Engineering and Physical Sciences Research Council), the MRC (Medical Research Council) and the DTI (Department of Trade and Industry).

The aim of the OGSA-DAI project is to develop middleware to assist with access and integration of data from separate sources via the grid. The project was conceived by the UK Database Task Force and is working closely with the Global Grid Forum DAIS-WG, the OMII and the Globus team.