By Gillian Law
06 October 2006
Business Search Technologies Corporation is a private company based in Tokyo. Set up in February 2004, it has been working on software research and development over three technical areas: search technology, internationalisation(I18N), and grid computing. BST has developed a security enhanced and multilingually capable commercial search engine package called WiSE (Worldwide Internet & Intranet Search Engine), and a software internationalisation(I18N) tool, called "World Wide Navi".
In the grid computing area, BST has been closely working with GTRC/AIST (Grid Technology Research Center/ National Institute of Advanced Industrial Science and Technology), a Japanese governmental research institute based in Tsukuba and Tokyo.
BST has been involved in two joint research efforts with GTRC. GridASP(see www.gridasp.org), and the data grid.
In November of 2005, BST was contacted by CBIE (Clinical and BioInformatic Engineering), a division of the Graduate School of Medicine at the University of Tokyo, who asked if it would be interested in working on a project, using OGSA-WebDB in the fight against cancer. The researchers were looking for a way of integrating information from various databases, including public web sites, in the search for a way to prevent cancer.
Having agreed that it would be keen to get involved, BST joined a project team in partnership with CBIE, and with GTRC/AIST.
The aim of the project was to develop a secure, integrated query interface to medical information over multiple databases, both public and private. These databases would include the JSNP (Japanese Single Nucleotide Polymorphism), a database of common gene variations in the Japanese population, OMIM (Online Mendelian Inheritance in Man), a database of human genes and genetic disorders developed by staff at Johns Hopkins University in Baltimore Maryland, PubMed, a system developed by the National Center for Biotechnology Information (NCBI) at the US National Institute for Health to offer access to citations from biomedical literature, and PharmGKB, a pharmacogenetics and pharmacogenomics knowledge base.
To do this, the team used two main technologies: OGSA-DAI middleware, to grid-enable the data resources, and OGSA-WebDB, to integrate the web databases with the OGSA-DAI environment.
"We chose to use the OGSA-DAI technologies as the basis of our OGSA-WebDB, because it is an open software, and also because we knew it could help us to retrieve the data we wanted from the web sites," said Isao Kojima, Senior Research Scientist, Leader of Data Grid Team, GTRC. "I worked closely with the OGSA-DAI team in Edinburgh, and had a lot of communication with them," mentioned Mirza Phalevi Said, Research Staff of Data Grid Team at GTRC/AIST. "They were very kind, and gave us a lot of advice."
"We positioned OGSA-DAI and OGSA-WebDB as the framework for building GSCP," said Jun-ichi Okamura, the architect of GSCP system from BST. "As far as I know, coupling of OGSA-DAI and OGSA-WebDB is the only technology available that can provide access to all the web and relational databases we wanted to access, and then allow you to process the data on the grid."
A single query on the system developed by the team was able to access all of the databases, and the results would be combined to show up on a single GridSearch screen, so that they could all be viewed as if they were from local data sources. Wrappers on the four databases being accessed insert the extracted data into proxy databases, which were then accessed via the OGSA-WebDB and OGSA-DAI interface. Integration of outside web databases in such a manner has revealed an important feedback to the team.
"We were dealing with outside databases, outside our control. And some of those have long response times, and there's nothing you can do about that. We had to enhance WebDB so that it can let us control the access time as needed." Thus, WebDB was improved to have several adjustable parameters such as, "connection retry number" and "maximum number of query results".
The first version of Grid Search for Cancer Prevention, V1.0, was released in March of this year. In future, the team hopes to be able to access more resources, with secure access to private data, and also more integration at the syntax level.
"One issue is that from time to time the outside sites change their interface - and that's beyond our control, we can't prevent it. So an end user of our system discovered one day that queries were failing, and he couldn't imagine why. So he called us up and we investigated, and that's when we realised it had changed. Currently we don't have an automated method of following and spotting that sort of change - we would have to call the managers of the sites to ask about their plans. Currently we don't really have a good idea of how to automate the process, but that's what we want to do, said Ryuichi Yoshida, a manager at BST's Research and Development Centre.