OGSA-DAI is:
The OGSA-DAI project started in February 2002. It has, to date, undergone three main phases.
Phase 1 was a collaboration between EPCC[2], NeSC[3], IBM, Oracle, NEReSC[4] and eSNW[5] with 18 months of funding of 3 million pounds from the UK DTI/EPSRC via the UK e-Science Grid Core Programme. Three major and three interim releases (point releases) of the OGSA-DAI software occurred during this time frame.
Phase 2 was a collaboration between EPCC, NeSC, IBM, NEReSC and eSNW with 24 months of funding of 1.5 million pounds from the UK DTI/EPSRC via the UK e-Science Grid Core Programme 2 as part of the OMII-UK project. Four major releases of the OGSA-DAI software occurred during this time frame.
In this section we describe various options for sharing data in a grid and motivate OGSA-DAI as one solution for this.
The Grid is all about sharing resources, e.g. computational, data and even people! OGSA-DAI facilitates the sharing of structured data resources. By structured data we mean data which can be queried and from which meaningful subsets can be extracted.
The simplest way to share data in a grid would be just to ZIP up the corresponding data and make this available on a web site. This has the following advantages:
However, the disadvantages are:
An alternative solution is to provide consumers with direct access to the data, e.g. by providing them with the URL of a database and giving them a username and password. This has the following advantage:
However the disadvantages are:
Another solution therefore is application-specific web services in which data is accessed and manipulated using application-specific operations on these, e.g.
Book findByISBN(ISBN) List[Book] findByAuthor(Author) List[Book] findByKeyword(Word)
The advantages of this include:
findBooksByAuthor, but not
findBooksByKeyword. Or they may be
able to run: findBooksByAuthor,
but only if the author they specify is one of a set.
However the disadvantages are:
Using generic, non-application-specific, web services (such as those provided by OGSA-DAI) addresses the problems of utilising application-specific web services in generic data access and integration tools. Consumers see the data in its 'raw' format, e.g. tables, columns, rows for relational data; collections, documents and elements for XML data. Consumers can obtain the schema of the data and send queries in the appropriate query language, e.g. SQL, XPath.
Using generic web services still incurs the overhead of converting data to and from XML prior to transfering it via SOAP over HTTP. So how can this overhead be reduced or eliminated altogether. One way to reduce overhead is to reduce the number of times a client interacts with the web service. This can be achieved with the use of workflows. A workflow can be specified by a client and supplied to a web service. The web service then executes this workflow and so performs in the scope of one web service operation any number of data-related actions (e.g. running queries, transforming data, performing updates).
In the example below, notice how the data goes over SOAP three times in the first deployment scenario where OGSA-DAI is being used solely for data access. Whereas in the second scenario we exploit OGSA-DAI's support for workflows to perform data access, transformation and delivery actions and so avoid any transfers of data via SOAP.
Before we describe OGSA-DAI workflows in detail a simple example is shown below.
Important points to note are:
These examples show how workflows in OGSA-DAI can be used to integrate data from multiple data sources. In the first example to SQL queries are executed on two separate databases. The results of the first query are then transformed in some way. This transformed data is then joined in some way with the results of the second query. The joined data is then delivered by some means.
In this second example we see how data exposed by multiple OGSA-DAI servers can be transported and integrated. A workflow is sent to an OGSA-DAI server. This runs an SQL query and makes the data available to other OGSA-DAI servers. A second workflow is then sent which pulls the data from the first OGSA-DAI server while simultaneously running an SQL query. The pulled data and the SQL query results are then joined in some way and delivered by some means.
This final example shows the use of OGSA-DAI in conjunction with a related tool called OGSA-DQP (distributed query processing). The client submits some workflow which interacts with a "virtual database". As far as the client is concerned this is an actual database upon which they can run queries. In fact however this database is in fact a federation of a number of actual physical databases exposed via OGSA-DAI servers. When a client submits their queries OGSA-DQP parses their queries and forwards these to the other OGSA-DAI servers. OGSA-DQP also marshals and joins the results of these queries into a single unified result for the client.
![]() | Note |
|---|---|
| OGSA-DQP is compliant with OGSA-DAI 2.2. It is intended to release a version of OGSA-DQP compliant with OGSA-DAI 3.0 in the near future. |
An activity is a well-defined workflow unit with a specific name. They are pluggable in that they can be dropped into OGSA-DAI without requiring any recompilation or recoding of OGSA-DAI.
Example activities include:
The OGSA-DAI team have defined a comprehensive and consistent standard activity set for data access and integration:
Karasavvas, K. Atkinson, M.P. and Hume, A.C. OGSA-DAI - Redesigned and New Activities. http://www.ogsadai.org.uk/documentation/ogsadai3.0/RedesignedAndNewActivitiesV1.9.pdf.
Appendix H, Activities in OGSA-DAI 3.0 provides a reference guide to the activities shipped with OGSA-DAI 3.0.
An activity can have 0 or more named inputs and 0 or more named outputs. Blocks of data flow from an activity's output into another activity's input. Activity inputs may be optional or required.
In an activity, there is no distinction between inputs and parameters. OGSA-DAI workflows use the notion of input literals which are used by clients to provide a parameter. It is up to the client whether an input value to an activity is provided by them or is provided via the output of another activity in the workflow.
All the required inputs of an activity must be connected to the output of another activity or have an associated input literal and all the outputs of an activity.
Activity inputs expect blocks of specific types and activity outputs output blocks of specific types.
The following block types are those used by OGSA-DAI activities (this does not preclude the use of application-specific or other block types):
Object, String, Integer, Long,
Double, Number, Date, Boolean.
char[], byte[], Clob,
Blob.
OGSA-DAI does not attempt to validate the connections between activities to ensure the objects written to an output are compatible with those expected at a connected input. Errors like this will be detected by the activities themselves when the workflow they are involved in is executed.
When executing, an activity iterates. It is provided one data block on each input. So, for example, given an activity with inputs FirstName and Surname is provided with the following blocks of input:
| Input | Data Value Stream |
|---|---|
| FirstName |
"Ally.txt",
"Amy.txt",
"Mike.txt"
|
| Surname |
"Hume.txt",
"Krause.txt",
"Jackson.txt"
|
On the first iteration the activity would receive
"Ally.txt" on input FirstName and
"Hume.txt" on input Surname. On the
second iteration the activity would receive
"Amy.txt" on FirstName and
"Krause.txt" on Surname.
In the presence of lists iteration becomes more complex. Suppose we have:
| Input | Data Value Stream |
|---|---|
| Member |
ListBegin,
"Debbie",
"Susanna",
"Michael",
"Vicki"
ListEnd,
ListBegin,
"Sting",
"Stewart Copeland",
"Andy Summers",
ListEnd,
|
| Band |
"The Bangles",
"The Police",
|
In this case, on the first iteration the activity gets a
list iterator for the Members input and the value
"The Bangles" for the Band input. On the
next iteration, it gets another list iterator for the Members input and
the value "The Police" for the Band
input.
What an activity does when it gets a list iterator is
implementation-specific. So, for example when our activity gets the
iterator for the second list from Members and value
"The Police" for the Band input it could
either:
Activities check their inputs and will ensure they are in synch. For example, in the above if the list containing the members of the Police arrived on the Members input but no associated value arrived on the Band input then the activity would raise an error.
If the client wants to repeat the same value multiple times it can be inconvenient to repeat the value. OGSA-DAI provides a ControlledRepeat activity that repeats values.
As an example of an activity, consider ObtainFromFTP which pulls a file from an FTP server. The inputs and outputs of this activity along with their types is summarised in the following table:
| Inputs/Outputs | Name | Type | Optional |
|---|---|---|---|
| Input | filename | String | no - we cannot obtain a file unless we know its name |
| Input | host | String | no - we cannot obtain a file unless we know where to get it from |
| Input | passiveMode | Boolean | yes - a default value of
false can be used unless the client
wants to override it |
| Output | data | [ byte[] ] | no - no activity outputs can be optional |
When the ObtainFromFTP activity starts, it validates that input blocks obtained for each input are instances of the required type and raises an error if this is not the case.
Note how the FTP activity outputs a list of byte arrays. This allows a file to be streamed in smaller arrays rather than in one massive array holding the entire file contents. The use of a list allows these smaller arrays to be kept together and their relationship preserved. This allows downstream activities to operate on one of the byte arrays already received while the ObtainFromFTP activity is still receiving the rest of the file from the FTP server.
As an example suppose the ObtainFromFTP activity received the following inputs:
| Input | Data Value Stream |
|---|---|
| filename |
"Ally.txt",
"Amy.txt",
"Mike.txt"
|
| host |
"my.ftp.host",
"your.ftp.host",
"their.ftp.host"
|
then it would iterate three times. On each iteration it would obtain, one file from one FTP server and output the file as a list of byte arrays. Across the three iterations three lists of byte arrays would result, one for each filename and host pair. The output would be as follows:
| Output | Data Value Stream |
|---|---|
| data |
ListBegin,
byte[],
byte[],
...,ListEnd, ListBegin,,
byte[],
byte[],
...,ListEnd, ListBegin,,
byte[],
byte[],
...,ListEnd,
|
Some activities can be targeted at OGSA-DAI resources. These are termed resource-specific activities. The activity interacts with the resource. The most common type of resource with which activities interact are types of data resources. OGSA-DAI data resources are components which abstract actual databases (or other data resources) into an OGSA-DAI compliant form. Examples of activities that interact with data resources include the SQLQuery activity which interacts with relational data resources and XMLListCollections activity which interacts with XMLDB data resources.
Activities can be defined to interact with any type of OGSA-DAI resource, e.g. there are activities for populating OGSA-DAI data sources (WriteToDataSource) or dumping state to or retrieving state from OGSA-DAI sessions (e.g. ObtainFromSession and DeliverToSession).
Not all activities need to be targeted at a resource however. For a wide variety of transformation, factory and delivery activities there will be no target resource. Examples of such generic activities include: TupleToCSV, ObtainFromFTP, DeliverToFTP, CreateResourceGroup, XSLTransform and many more.
When an workflow is executed by OGSA-DAI the activities in the workflow execute in parallel. Data streams through activities in a pipeline-like way through fixed-size buffers termed pipes and each activity operates on a different portion of a data stream (if the activities are well defined) at the same time.
A set of connected activities is termed an pipeline and a valid pipeline satisfies the following:
Clients can control when activities are executed by using different types of workflow. OGSA-DAI supports three types of workflow.

Figure 6.15. Sequence workflow
Clients execute workflows using OGSA-DAI as follows: the client submits their workflow (or request) to a data resquest execution service (DRES). This is a web service which provides access to a data resquest execution resource (DRER).
The data request execution resource is OGSA-DAI's workflow execution component. It:
See Section K.1, “Request execution” for a formal description.
The request status that is returned by a DRER contains the following:
See Section K.3, “Request status” for a formal description.
When a client submits a workflow they can specify one of two modes of execution:
Asynchronous execution is the recommended mode of operation as this gives a client more control over the execution of the workflow as we describe shortly. Synchronous execution can be useful for workflows that are very simple and quick to execute.
What are OGSA-DAI resources? These are components which are named and which can be accessed or refered to by clients. All clients who execute workflows will make use of a data request execution resource since this is the component that handles workflow execution in OGSA-DAI. Clients will also typically make use of data resources also - the OGSA-DAI abstraction of databases or other data resources - and will cite these in their workflows. However, there are other types of OGSA-DAI resource. In this section all six resource types are summarised.
Please see Appendix J, OGSA-DAI resources specification for a detailed specification.
We already described this in Section 6.4.9, “Executing workflows”. Any client executing a workflow will use a DRER.
We already introduced these in Section 6.4.7, “Activities and resources”. OGSA-DAI data resources are components which abstract actual databases (or other data resources or anything really) into an OGSA-DAI compliant form. Any client executing a workflow that accesses or updates data will use one or more data resources in their workflows.
A data source is an OGSA-DAI resource which exposes a set of data on the OGSA-DAI server. This data can then be pulled from the OGSA-DAI server via a data source service. Data sources are one way of supporting asynchronous data delivery in OGSA-DAI. A client executes a workflow to create a data source and populate it with data. The client (or another client) can then stream data back from the data source.
A data sink is an OGSA-DAI resource which receives data into the OGSA-DAI server. Data can be pushed from a client from the OGSA-DAI server via a data sink service. Data sinks are one way of supporting asynchronous data delivery in OGSA-DAI. A client executes a workflow to create a data sink. The client (or another client) can then stream data into this data sink. A workflow can be submitted to the OGSA-DAI server to get the data that has been deposited in the data sink and then forward this to other activities.
A session is an OGSA-DAI resource which acts as a state container associated with a sequence of workflows. They can be created by clients who wish to share state across workflows, e.g. lodge some state during the execution of one workflow then retrieve it during the execution of a subsequent workflow. A client can express the creation of a session via a request to a DRER.
A request resource is an OGSA-DAI resource which is associated with a workflow submitted to a DRER. If a client specifies that a workflow is to be executed asynchronously then a request resource is created. This request resource provides a means by which the client can monitor the status of execution of their asynchronous workflow and so can determine when it's finished and when any data is available. It also provides a means by which clients can terminate workflows if required.
Basically, the request resource provides access to the request status as already described in Section 6.4.9, “Executing workflows” as well a high-level request execution status - a value summarising the execution status (see Section K.2, “Request execution status” for a formal description). It is recommended best practice that clients only read the request status once and that this is done after the request is known to have finished - progress can be tracked using the simple request execution status value.
As we first mentioned in Section 6.4.7, “Activities and resources” activities can be written to interact with any type of resource. Some examples follow:
In addition, activities can be written that create resources. For example:
Typically OGSA-DAI is accessed via a web services presentation layer. OGSA-DAI web services expose OGSA-DAI resources. Clients specify the resource of interest when interacting with the web service. There are six types of OGSA-DAI web service which correspond to the six resource types:
Please see Appendix L, OGSA-DAI services specification for a detailed specification.
An overview of the potential components involved OGSA-DAI distribution is shown schematically in the figure below.
This section outlines the ways in which OGSA-DAI can be extended.
Application developers can extend OGSA-DAI by writing activities that perform different types of functionality. These can be dropped into an OGSA-DAI server and used without the need to rebuild OGSA-DAI or understand how OGSA-DAI as a whole works.
Activities could be written to support:
Application developers can extend OGSA-DAI by writing OGSA-DAI data resources that allow application-specific data resources to be used in conjunction with activities in workflows. These can be dropped into an OGSA-DAI server and used without the need to rebuild OGSA-DAI or understand how OGSA-DAI as a whole works.
A data resource can be anything:
For example:
Application developers can extend OGSA-DAI by writing application-specific presentation layers. OGSA-DAI functionality can then be hidden behind application-specific web services. These web services could map their operations to "template" OGSA-DAI workflows. When an operation is invoked by a client its arguments are used to populate the workflow when is then executed.
In OGSA-DAI it can be important to control such issues as who can access data resources (or, indeed, any OGSA-DAI resources) and what they can do with these, for example:
In OGSA-DAI authentication and authorization are the responsibility of the presentation layer and application developers can implement application-specific authentication and authorization depending upon the OGSA-DAI version, the OGSA-DAI administrator (i.e. the person or project who has deployed an OGSA-DAI server).
One consideration when writing data resources that allow databases to be exposed in OGSA-DAI is how to map presentation layer security information (e.g. security credentials) to database usernames and passwords.
OGSA-DAI has the notion of a "login provider" which developers can use and customise. This is used by OGSA-DAI's relational and XMLDB data resources for example. By default, the OGSA-DAI login provider stores mappings from security credentials to database usernames and passwords in a file located on the OGSA-DAI server. However, application developers could change the login provider to perhaps get this information from a database or via a call-out to a remote mapping service.
Login providers can be dropped into an OGSA-DAI server and used without the need to rebuild OGSA-DAI or understand how OGSA-DAI works as a whole. OGSA-DAI deployers have the choice of using one login provider for all data resources on an OGSA-DAI server (the default configuration) or one login provider per data resource. Application developers, when implementing support for their own databases, can use the login provider API or implement their own solution if desired.
The OGSA-DAI client toolkit is designed to ease the construction of OGSA-DAI clients by providing client-side abstractions of activities, workflows, resources and services. It provides components which contact OGSA-DAI web services, submit the workflows to OGSA-DAI, and parse the request status and data after a workflow has been executed.
This allows application developers to focus on the construction of valid workflows via the connection of objects representing the activities in these workflows and the handling of resulting data which is provided to the application in the form of useful objects (e.g. Java ResultSets for relational data). More generally therefore developers can focus on constructing their applications rather than on the mechanics of interacting with OGSA-DAI.
The SEE-GEO (SEcurE access to GEOspatial services) project ( http://edina.ac.uk/projects/seesaw/seegeo/index.html) used many of the features of OGSA-DAI when developing a service to provide access to geo-spatial information on a grid.
As part of a geo-linking interoperability experiment, the project addressed a scenario which involved accessing and joining data from two existing data resources and then rendering this data in a graphical format. The two data resources are as follows:
By joining data from these data resources questions such as "how does the cost of a loaf of bread vary across different regions in Scotland" can be answered.
The project used OGSA-DAI to develop a geo-linking service which would get data from each of the two data resources and execute a join across these. It would then use the joined data, in conjunction with a feature portrayal service (FPS), to get an image representation of the joined data which could then be returned to the client.
The solution uses a number of features of OGSA-DAI:
OGSA-DAI gave SEE-GEO many advantages including:
There are many reasons why OGSA-DAI may be suitable for your data access and integration requirements in a grid environment. These include:
OGSA-DAI has been misused, misunderstood and misrepresented. Some users have found that OGSA-DAI has made things worse for their particular applications. In particular, a common comment has been that "OGSA-DAI is not a fast as JDBC".
OGSA-DAI is not intended to be a complete solution to every data-related problem or a replacement for or competitor to JDBC (indeed OGSA-DAI uses JDBC!). The problems in using OGSA-DAI include:
In particular, OGSA-DAI may not be suitable if:
[2] EPCC, http://www.epcc.ed.ac.uk.
[3] National e-Science Centre, http://www.nesc.ac.uk.
[4] North-East Regional e-Science Centre, http://www.neresc.ac.uk.
[5] The e-Science North-West Centre, http://www.esnw.ac.uk.