Chapter 6. Overview of OGSA-DAI

6.1. What is OGSA-DAI
6.2. The OGSA-DAI project
6.2.1. Phase 1 - OGSA-DAI - February 2002 to July 2003
6.2.2. Phase 2 - DAIT (OGSA-DAI 2) - October 2003 to October 2005
6.2.3. Phase 3 - OGSA-DAI: an OMII-UK Node - November 2005 to October 2008
6.3. Sharing data in a Grid
6.3.1. Motivation
6.3.2. Sharing data via web site download
6.3.3. Sharing data via direct access
6.3.4. Application-specific web services
6.3.5. OGSA-DAI generic web services
6.3.6. Getting away from SOAP - workflows
6.4. Data-centric workflows
6.4.1. A simple example
6.4.2. Data integration and workflows
6.4.3. Activities
6.4.4. Activity inputs, outputs and blocks
6.4.5. Activities and iteration
6.4.6. An example activity - ObtainFromFTP
6.4.7. Activities and resources
6.4.8. Activities and workflows execution order
6.4.9. Executing workflows
6.5. Resources
6.5.1. Data request execution resource (DRER)
6.5.2. Data resource
6.5.3. Data source
6.5.4. Data sink
6.5.5. Session
6.5.6. Request
6.6. Resources and activities revisited
6.7. Services
6.8. OGSA-DAI components
6.9. Extending OGSA-DAI
6.9.1. Extending OGSA-DAI - writing activities
6.9.2. Extending OGSA-DAI - writing data resources
6.9.3. Extending OGSA-DAI - presentation layers
6.9.4. Extending OGSA-DAI - security
6.9.5. Extending OGSA-DAI - database access
6.10. OGSA-DAI client toolkit
6.11. OGSA-DAI in action - SEE-GEO
6.12. Why use OGSA-DAI?
6.13. Why might OGSA-DAI NOT be suitable?

6.1. What is OGSA-DAI

OGSA-DAI is:

  • An extensible framework
  • accessed via web services
  • that executes data-centric workflows
  • involving heterogeneous data resources
  • for the purposes of data access, integration, transformation and delivery within a grid
  • and is intended as a toolkit for building higher-level application-specific data services

6.2. The OGSA-DAI project

The OGSA-DAI project started in February 2002. It has, to date, undergone three main phases.

6.2.1. Phase 1 - OGSA-DAI - February 2002 to July 2003

Phase 1 was a collaboration between EPCC[2], NeSC[3], IBM, Oracle, NEReSC[4] and eSNW[5] with 18 months of funding of 3 million pounds from the UK DTI/EPSRC via the UK e-Science Grid Core Programme. Three major and three interim releases (point releases) of the OGSA-DAI software occurred during this time frame.

6.2.2. Phase 2 - DAIT (OGSA-DAI 2) - October 2003 to October 2005

Phase 2 was a collaboration between EPCC, NeSC, IBM, NEReSC and eSNW with 24 months of funding of 1.5 million pounds from the UK DTI/EPSRC via the UK e-Science Grid Core Programme 2 as part of the OMII-UK project. Four major releases of the OGSA-DAI software occurred during this time frame.

6.2.3. Phase 3 - OGSA-DAI: an OMII-UK Node - November 2005 to October 2008

Phase 3 is a collaboration between EPCC and NeSC with with 24 months of funding of 1.9 million pounds from the EPSRC. Three releases of the OGSA-DAI software have occurred to date in this time frame.

6.3. Sharing data in a Grid

In this section we describe various options for sharing data in a grid and motivate OGSA-DAI as one solution for this.

6.3.1. Motivation

The Grid is all about sharing resources, e.g. computational, data and even people! OGSA-DAI facilitates the sharing of structured data resources. By structured data we mean data which can be queried and from which meaningful subsets can be extracted.

Clients sharing different types of data resource.

Figure 6.1. Sharing data in a grid


6.3.2. Sharing data via web site download

The simplest way to share data in a grid would be just to ZIP up the corresponding data and make this available on a web site. This has the following advantages:

  • Easy distribution for providers.
  • Easy access for consumers.

However, the disadvantages are:

  • Consumers have to download all the data even if only interested in a small subset.
  • Consumers have to load data into local databases to use it.
  • It is a static snapshot of the data.
  • How do you secure that data and provide access to it?

6.3.3. Sharing data via direct access

An alternative solution is to provide consumers with direct access to the data, e.g. by providing them with the URL of a database and giving them a username and password. This has the following advantage:

  • Consumers have direct access.

However the disadvantages are:

  • Firewall issues and open ports - at both the server and also at client to allow outgoing connections. In addition, there may be different default ports across database products.
  • User and password management is hard. Do you assign a username/password per user? Per company? What is someone leaves a company?
  • No consistent security model - product-specific security authentication/authorization. Ideally want to use standard Grid security mechanisms and map these.
  • Hard to use in grid/web service workflows.
  • No server-side layer in which to hide or standardize database heterogeneities, e.g. the fact that different databases have different query syntax especially for table creation. OGSA-DAI does not currently do this but there is a layer where it can be done.
  • Many drivers, depending upon the database product, and version as well as programming languages. What if a data provider specifies a new one? Or a bug-fixed one? All consumers need to update. This is not scalable to the grid. Some databases now support embedded web services though.
  • Different APIs across different data types.
    • Relational databases and JDBC.
    • XML databases and XMLDB.
    • Indexed files and Lucene.

6.3.4. Application-specific web services

Another solution therefore is application-specific web services in which data is accessed and manipulated using application-specific operations on these, e.g.

Book findByISBN(ISBN)
List[Book] findByAuthor(Author)
List[Book] findByKeyword(Word)

The advantages of this include:

  • Fits with grid/web service approach and can use grid security infrastructure.
  • A whole variety of back end databases with the data in completely different formats can be easily abstracted to this interface.
  • Web services are programming language neutral - one client can talk to any services that have the same WSDL description.
  • Operations likely to map well to authorization policies For example, a user may be allowed to run: findBooksByAuthor, but not findBooksByKeyword. Or they may be able to run: findBooksByAuthor, but only if the author they specify is one of a set.

However the disadvantages are:

  • Slower than direct access as access to the data has to go through the web service layer. There is also the overhead in converting data to and from XML for transport via SOAP/HTTP.
  • Use of an application-specific API prevents use of generic data exploration, mining and manipulation tools. For example there can't be a generic application that interacts with a Books dataset and mines it for occurrences of "rabbit" and then mines a Genes dataset for occurrences of "rabbit". This means that it isn't possible to use generic tools to link datasets to find matches, e.g. to run a join across employees and books databases and then switch out books and switch in cancer deaths (for example to see the effect of a chemical leak in a University on the publications by staff).
A generic data linking application which can query a Books database, a Cancer database and a University Employees database and performs joins across these.

Figure 6.2. A generic data linking application


6.3.5. OGSA-DAI generic web services

Using generic, non-application-specific, web services (such as those provided by OGSA-DAI) addresses the problems of utilising application-specific web services in generic data access and integration tools. Consumers see the data in its 'raw' format, e.g. tables, columns, rows for relational data; collections, documents and elements for XML data. Consumers can obtain the schema of the data and send queries in the appropriate query language, e.g. SQL, XPath.

An OGSA-DAI web service providing access to a relational, XML and indexed file resource.

Figure 6.3. OGSA-DAI's generic web services


6.3.6. Getting away from SOAP - workflows

Using generic web services still incurs the overhead of converting data to and from XML prior to transfering it via SOAP over HTTP. So how can this overhead be reduced or eliminated altogether. One way to reduce overhead is to reduce the number of times a client interacts with the web service. This can be achieved with the use of workflows. A workflow can be specified by a client and supplied to a web service. The web service then executes this workflow and so performs in the scope of one web service operation any number of data-related actions (e.g. running queries, transforming data, performing updates).

In the example below, notice how the data goes over SOAP three times in the first deployment scenario where OGSA-DAI is being used solely for data access. Whereas in the second scenario we exploit OGSA-DAI's support for workflows to perform data access, transformation and delivery actions and so avoid any transfers of data via SOAP.

A client accesses an OGSA-DAI web server, gets data, submits this to a transform web service, gets the result and puts this on an FTP server. This can be replaced by submission of an OGSA-DAI workflow and OGSA-DAI doing the transformation. SOAP transfers of data are reduced.

Figure 6.4. Using OGSA-DAI workflows to reduce service invocations


6.4. Data-centric workflows

6.4.1. A simple example

Before we describe OGSA-DAI workflows in detail a simple example is shown below.

An OGSA-DAI workflow which runs an SQL query and converts the results to WebRowSet. It also gets an XSL style-sheet via HTTP. It then applies an XSL transform using the style-sheet to the WebRowSet and delivers the results to a URL.

Figure 6.5. An example OGSA-DAI workflow


Important points to note are:

  • A workflow consists of a number of units called activities. An activity is an individual unit of work in the workflow and performs a well-defined data-related task such as running an SQL query, performing a data transformation or delivering data.
  • Activities are connected. The outputs of activities connect to the inputs of other activities.
  • Data flows from activities to other activites and this is in one direction only.
  • Different activities may output data in different formats and may expect their input data in different formats. Transformation activities can transform data between these formats.
  • Workflows, and therefore OGSA-DAI itself, do not just carry out data access, but also data updates, transformations and delivery.

6.4.2. Data integration and workflows

These examples show how workflows in OGSA-DAI can be used to integrate data from multiple data sources. In the first example to SQL queries are executed on two separate databases. The results of the first query are then transformed in some way. This transformed data is then joined in some way with the results of the second query. The joined data is then delivered by some means.

A workflow that runs two SQL queries, transforms the data from one, then joins this with the data from the second and delivers the joined data.

Figure 6.6. Data integration using a single workflow and OGSA-DAI server


In this second example we see how data exposed by multiple OGSA-DAI servers can be transported and integrated. A workflow is sent to an OGSA-DAI server. This runs an SQL query and makes the data available to other OGSA-DAI servers. A second workflow is then sent which pulls the data from the first OGSA-DAI server while simultaneously running an SQL query. The pulled data and the SQL query results are then joined in some way and delivered by some means.

A workflow that runs two SQL queries, transforms the data from one, then joins this with the data from the second and delivers the joined data.

Figure 6.7. Data integration using two workflows and two OGSA-DAI servers


This final example shows the use of OGSA-DAI in conjunction with a related tool called OGSA-DQP (distributed query processing). The client submits some workflow which interacts with a "virtual database". As far as the client is concerned this is an actual database upon which they can run queries. In fact however this database is in fact a federation of a number of actual physical databases exposed via OGSA-DAI servers. When a client submits their queries OGSA-DQP parses their queries and forwards these to the other OGSA-DAI servers. OGSA-DQP also marshals and joins the results of these queries into a single unified result for the client.

OGSA-DAI uses OGSA-DQP to parse a query and query a number of remote OGSA-DAI services and marshall the results. The client is unaware of this.

Figure 6.8. Using OGSA-DAI and OGSA-DQP


[Note]Note
OGSA-DQP is compliant with OGSA-DAI 2.2. It is intended to release a version of OGSA-DQP compliant with OGSA-DAI 3.0 in the near future.

6.4.3. Activities

An activity is a well-defined workflow unit with a specific name. They are pluggable in that they can be dropped into OGSA-DAI without requiring any recompilation or recoding of OGSA-DAI.

Example activities include:

  • SQLQuery - Execute an SQL query on a relational database.
  • ListDirectory - List the files in a directory.
  • XSLTransform - Execute an XSL transform on an XML document.
  • DeliverToFTP - Deliver data to an FTP server.

The OGSA-DAI team have defined a comprehensive and consistent standard activity set for data access and integration:

Karasavvas, K. Atkinson, M.P. and Hume, A.C. OGSA-DAI - Redesigned and New Activities. http://www.ogsadai.org.uk/documentation/ogsadai3.0/RedesignedAndNewActivitiesV1.9.pdf.

Appendix H, Activities in OGSA-DAI 3.0 provides a reference guide to the activities shipped with OGSA-DAI 3.0.

6.4.4. Activity inputs, outputs and blocks

An activity can have 0 or more named inputs and 0 or more named outputs. Blocks of data flow from an activity's output into another activity's input. Activity inputs may be optional or required.

SQLQuery activity with an output connected to a TupleToCSV activity's input.

Figure 6.9. Activity inputs and outputs


In an activity, there is no distinction between inputs and parameters. OGSA-DAI workflows use the notion of input literals which are used by clients to provide a parameter. It is up to the client whether an input value to an activity is provided by them or is provided via the output of another activity in the workflow.

All the required inputs of an activity must be connected to the output of another activity or have an associated input literal and all the outputs of an activity.

SQLQuery activity with an input literal providing the SQL query expression and its output connected to a TupleToCSV activity's input.

Figure 6.10. Activity inputs, input literals and outputs


Activity inputs expect blocks of specific types and activity outputs output blocks of specific types.

SQLQuery activity with an input literal providing the SQL query expression and its output connected to a TupleToCSV activity's input.

Figure 6.11. Activity input and output types


The following block types are those used by OGSA-DAI activities (this does not preclude the use of application-specific or other block types):

  • Java's basic types - Object, String, Integer, Long, Double, Number, Date, Boolean.
  • Binary types - char[], byte[], Clob, Blob.
    • BLOBs obtained from databases are stored as BLOB objects within Tuples and references to entire BLOBs are passed between activities.
    • Byte arrays are typically data obtained from FTP.
    • All binary data processing activities provided in OGSA-DAI can handle both representations.
  • Tuple - OGSA-DAI representation of a row of relational data which has one element per column in the row.
  • MetadataWrapper - OGSA-DAI wrapper for any object to be treated as meta-data, This allows the use of application-specific metadata within OGSA-DAI - individual activities handle metadata blocks as they see fit.
  • List - a list groups related blocks together. Special blocks are used to mark the beginning and the end of a list.
    • Lists are useful for keeping related blocks together.
    • Lists are not a single block like a Tuple or a MetaDataWrapper rather they consist of a ListBegin block, a sequence of related blocks, and a ListEnd block. Multiple activities could therefore operate on different parts of the list simultaneously.
    • For example SQLQuery can dynamically take any number of SQL query expressions as input. Lists allow differentiation between the results of each query by grouping the Tuples belonging to each query result together.
    • Activities define the granularity of their inputs and outputs.

OGSA-DAI does not attempt to validate the connections between activities to ensure the objects written to an output are compatible with those expected at a connected input. Errors like this will be detected by the activities themselves when the workflow they are involved in is executed.

SQLQuery activity with an input providing multiple SQL query expressions and its output a sequence of lists of tuples, connected to a TupleToCSV activity's input.

Figure 6.12. Activity inputs and outputs and lists


6.4.5. Activities and iteration

When executing, an activity iterates. It is provided one data block on each input. So, for example, given an activity with inputs FirstName and Surname is provided with the following blocks of input:

InputData Value Stream
FirstName "Ally.txt", "Amy.txt", "Mike.txt"
Surname "Hume.txt", "Krause.txt", "Jackson.txt"

On the first iteration the activity would receive "Ally.txt" on input FirstName and "Hume.txt" on input Surname. On the second iteration the activity would receive "Amy.txt" on FirstName and "Krause.txt" on Surname.

In the presence of lists iteration becomes more complex. Suppose we have:

InputData Value Stream
Member ListBegin, "Debbie", "Susanna", "Michael", "Vicki" ListEnd, ListBegin, "Sting", "Stewart Copeland", "Andy Summers", ListEnd,
Band "The Bangles", "The Police",

In this case, on the first iteration the activity gets a list iterator for the Members input and the value "The Bangles" for the Band input. On the next iteration, it gets another list iterator for the Members input and the value "The Police" for the Band input.

What an activity does when it gets a list iterator is implementation-specific. So, for example when our activity gets the iterator for the second list from Members and value "The Police" for the Band input it could either:

  • Get every value in the Members list and do something with this and the Band value, e.g. dump to a file the value "Sting, Stewart Copeland, Andy Summers are in The Police".
  • Get each element in the Members list in turn and do something with this and the Band value, e.g. dump to a file the value "Sting is in The Police" then "Stewart Copeland is in The Police" then "Andy Summers is in The Police". This is the more typical approach to handling lists.

Activities check their inputs and will ensure they are in synch. For example, in the above if the list containing the members of the Police arrived on the Members input but no associated value arrived on the Band input then the activity would raise an error.

If the client wants to repeat the same value multiple times it can be inconvenient to repeat the value. OGSA-DAI provides a ControlledRepeat activity that repeats values.

Workflow portion showing use of ControlledRepeat.

Figure 6.13. Workflow portion showing use of ControlledRepeat


6.4.6. An example activity - ObtainFromFTP

As an example of an activity, consider ObtainFromFTP which pulls a file from an FTP server. The inputs and outputs of this activity along with their types is summarised in the following table:

Inputs/OutputsNameTypeOptional
InputfilenameStringno - we cannot obtain a file unless we know its name
InputhostStringno - we cannot obtain a file unless we know where to get it from
InputpassiveModeBooleanyes - a default value of false can be used unless the client wants to override it
Outputdata[ byte[] ]no - no activity outputs can be optional

When the ObtainFromFTP activity starts, it validates that input blocks obtained for each input are instances of the required type and raises an error if this is not the case.

Note how the FTP activity outputs a list of byte arrays. This allows a file to be streamed in smaller arrays rather than in one massive array holding the entire file contents. The use of a list allows these smaller arrays to be kept together and their relationship preserved. This allows downstream activities to operate on one of the byte arrays already received while the ObtainFromFTP activity is still receiving the rest of the file from the FTP server.

As an example suppose the ObtainFromFTP activity received the following inputs:

InputData Value Stream
filename "Ally.txt", "Amy.txt", "Mike.txt"
host "my.ftp.host", "your.ftp.host", "their.ftp.host"

then it would iterate three times. On each iteration it would obtain, one file from one FTP server and output the file as a list of byte arrays. Across the three iterations three lists of byte arrays would result, one for each filename and host pair. The output would be as follows:

OutputData Value Stream
data ListBegin, byte[], byte[], ...,ListEnd, ListBegin,, byte[], byte[], ...,ListEnd, ListBegin,, byte[], byte[], ...,ListEnd,

6.4.7. Activities and resources

Some activities can be targeted at OGSA-DAI resources. These are termed resource-specific activities. The activity interacts with the resource. The most common type of resource with which activities interact are types of data resources. OGSA-DAI data resources are components which abstract actual databases (or other data resources) into an OGSA-DAI compliant form. Examples of activities that interact with data resources include the SQLQuery activity which interacts with relational data resources and XMLListCollections activity which interacts with XMLDB data resources.

Activities can be defined to interact with any type of OGSA-DAI resource, e.g. there are activities for populating OGSA-DAI data sources (WriteToDataSource) or dumping state to or retrieving state from OGSA-DAI sessions (e.g. ObtainFromSession and DeliverToSession).

SQLQuery activity with an input literal providing the SQL query expression and its output connected to a TupleToCSV activity's input. SQLQuery is targeted at a MySQL data resource.

Figure 6.14. Activities and resources


Not all activities need to be targeted at a resource however. For a wide variety of transformation, factory and delivery activities there will be no target resource. Examples of such generic activities include: TupleToCSV, ObtainFromFTP, DeliverToFTP, CreateResourceGroup, XSLTransform and many more.

6.4.8. Activities and workflows execution order

When an workflow is executed by OGSA-DAI the activities in the workflow execute in parallel. Data streams through activities in a pipeline-like way through fixed-size buffers termed pipes and each activity operates on a different portion of a data stream (if the activities are well defined) at the same time.

A set of connected activities is termed an pipeline and a valid pipeline satisfies the following:

  • There must be a path of connected inputs and outputs (possibly via intermediate activities) between each pair of activities in the pipeline.
  • No activity in the pipeline can be connected to an activity outside the pipeline.

Clients can control when activities are executed by using different types of workflow. OGSA-DAI supports three types of workflow.

  • Pipeline workflow - set of chained activities executed in parallel with data flowing between the activities.
  • Sequence workflow - a set of sub-workflows each executed in sequence. For example a sequence workflow could be defined with two sub-workflows, one to create a database table and the second to bulk load data into this table. The execution of the sequence workflow ensures that the second sub-workflow, loading the data, does not start until the first, creating the table, has completed.
  • Parallel workflow - set of sub-workflows executed in parallel.
Sequence workflow.
Workflow containing two pipelines (white boxes) that are within a sequence box (lilac) indicating that the pipelines are to be executed in sequence. This will allow the first pipeline to create a table in the database before the second pipeline writes to it.

Figure 6.15. Sequence workflow


6.4.9. Executing workflows

Clients execute workflows using OGSA-DAI as follows: the client submits their workflow (or request) to a data resquest execution service (DRES). This is a web service which provides access to a data resquest execution resource (DRER).

The data request execution resource is OGSA-DAI's workflow execution component. It:

  • Parses the workflow.
  • Creates the activities specified in the workflow.
  • Provides activities with their target resources (if any).
  • Executes the workflow.
  • Builds a request status.
  • Returns the request status to the client (via the data request execution service).
  • It also contains a handler for handling session creation if the client wants to execute related workflows and share state between these.
  • It executes a number of workflows concurrently and can also queue a number more.

See Section K.1, “Request execution” for a formal description.

A client communicates with a data request execution resource exposed via a data request execution service. The workflow submitted by the client cites a data resource also known to the OGSA-DAI server.

Figure 6.16. Executing a workflow


The request status that is returned by a DRER contains the following:

  • Status of execution of each activity in the workflow, i.e. did it complete or did it run into an error?
  • Status of execution of whole workflow which is derived from status of individual activities, i.e. did they complete or did they all run into errors, was the workflow prematurely terminated by the client?
  • Data - depending upon the activities in the workflow and whether the workflow was executed synchronously or asynchronously. OGSA-DAI provides a DeliverToRequestStatus activity which ensures that any data it receives is added to the request status.

See Section K.3, “Request status” for a formal description.

When a client submits a workflow they can specify one of two modes of execution:

  • Synchronous execution - the data request execution service returns a request status to the client only when the workflow has completed execution.
  • Asynchronous execution - the data request execution service returns a request status to the client as soon as the workflow starts executing. Along with this will be the ID of a request resource which the client can use to monitor the request status - we return to this in the discussion of request resources and request management services shortly.

Asynchronous execution is the recommended mode of operation as this gives a client more control over the execution of the workflow as we describe shortly. Synchronous execution can be useful for workflows that are very simple and quick to execute.

6.5. Resources

What are OGSA-DAI resources? These are components which are named and which can be accessed or refered to by clients. All clients who execute workflows will make use of a data request execution resource since this is the component that handles workflow execution in OGSA-DAI. Clients will also typically make use of data resources also - the OGSA-DAI abstraction of databases or other data resources - and will cite these in their workflows. However, there are other types of OGSA-DAI resource. In this section all six resource types are summarised.

Please see Appendix J, OGSA-DAI resources specification for a detailed specification.

6.5.1. Data request execution resource (DRER)

We already described this in Section 6.4.9, “Executing workflows”. Any client executing a workflow will use a DRER.

6.5.2. Data resource

We already introduced these in Section 6.4.7, “Activities and resources”. OGSA-DAI data resources are components which abstract actual databases (or other data resources or anything really) into an OGSA-DAI compliant form. Any client executing a workflow that accesses or updates data will use one or more data resources in their workflows.

6.5.3. Data source

A data source is an OGSA-DAI resource which exposes a set of data on the OGSA-DAI server. This data can then be pulled from the OGSA-DAI server via a data source service. Data sources are one way of supporting asynchronous data delivery in OGSA-DAI. A client executes a workflow to create a data source and populate it with data. The client (or another client) can then stream data back from the data source.

6.5.4. Data sink

A data sink is an OGSA-DAI resource which receives data into the OGSA-DAI server. Data can be pushed from a client from the OGSA-DAI server via a data sink service. Data sinks are one way of supporting asynchronous data delivery in OGSA-DAI. A client executes a workflow to create a data sink. The client (or another client) can then stream data into this data sink. A workflow can be submitted to the OGSA-DAI server to get the data that has been deposited in the data sink and then forward this to other activities.

6.5.5. Session

A session is an OGSA-DAI resource which acts as a state container associated with a sequence of workflows. They can be created by clients who wish to share state across workflows, e.g. lodge some state during the execution of one workflow then retrieve it during the execution of a subsequent workflow. A client can express the creation of a session via a request to a DRER.

6.5.6. Request

A request resource is an OGSA-DAI resource which is associated with a workflow submitted to a DRER. If a client specifies that a workflow is to be executed asynchronously then a request resource is created. This request resource provides a means by which the client can monitor the status of execution of their asynchronous workflow and so can determine when it's finished and when any data is available. It also provides a means by which clients can terminate workflows if required.

Basically, the request resource provides access to the request status as already described in Section 6.4.9, “Executing workflows” as well a high-level request execution status - a value summarising the execution status (see Section K.2, “Request execution status” for a formal description). It is recommended best practice that clients only read the request status once and that this is done after the request is known to have finished - progress can be tracked using the simple request execution status value.

6.6. Resources and activities revisited

As we first mentioned in Section 6.4.7, “Activities and resources” activities can be written to interact with any type of resource. Some examples follow:

  • SQLQuery - run an SQL query on a JDBC data resource.
  • XPathQuery - run an XPath query on an XMLDB data resource.
  • SQLBag - run an SQL query on all the resources listed in a resource group data resource.
  • WriteToDataSource - populate a new data source with its data.
  • ReadDataSink - read from a data sink the data that was pushed into it.
  • ObtainFromSession - get state from a session.
  • DeliverToSession - put state in a session.

In addition, activities can be written that create resources. For example:

  • CreateDataSource
  • CreateDataSink
  • CreateResourceGroup

6.7. Services

Typically OGSA-DAI is accessed via a web services presentation layer. OGSA-DAI web services expose OGSA-DAI resources. Clients specify the resource of interest when interacting with the web service. There are six types of OGSA-DAI web service which correspond to the six resource types:

  • Data request execution service: clients use this to submit workflows, create sessions, get the request status of synchronous requests.
  • Data resource information service: clients can use this to query information about a data resource, e.g. product name, vendor, version.
  • Data source service: clients can use this to pull data from data sources.
  • Data sink service: clients can use this to push data to data sinks.
  • Session management service: clients can use this to manage the lifetime of sessions.
  • Request management service: clients can use this to query request execution status and get data associated with asynchronous requests.

Please see Appendix L, OGSA-DAI services specification for a detailed specification.

6.8. OGSA-DAI components

An overview of the potential components involved OGSA-DAI distribution is shown schematically in the figure below.

A block diagram showing the main OGSA-DAI components including different presentation layers, the core, data resources, activities and the persistence and configuration components.

Figure 6.17. OGSA-DAI components


6.9. Extending OGSA-DAI

This section outlines the ways in which OGSA-DAI can be extended.

6.9.1. Extending OGSA-DAI - writing activities

Application developers can extend OGSA-DAI by writing activities that perform different types of functionality. These can be dropped into an OGSA-DAI server and used without the need to rebuild OGSA-DAI or understand how OGSA-DAI as a whole works.

Activities could be written to support:

  • Additional generic functionality, e.g. DeliverToMessageQueue.
  • Additional resource-specific functionality, e.g. SQLStoredProcedure.
  • Application-specific functionality, e.g. TransformToFasta.

6.9.2. Extending OGSA-DAI - writing data resources

Application developers can extend OGSA-DAI by writing OGSA-DAI data resources that allow application-specific data resources to be used in conjunction with activities in workflows. These can be dropped into an OGSA-DAI server and used without the need to rebuild OGSA-DAI or understand how OGSA-DAI as a whole works.

A data resource can be anything:

  • Local or remote.
  • Real or virtual.
  • Persistent or in-memory.

For example:

  • A view onto a relational database.
  • A list of IDs of related data resources on one or more OGSA-DAI servers.
  • A new XML database.
  • Open Geospatial Consortium (OGC) data access services.
  • Application specific web service.

6.9.3. Extending OGSA-DAI - presentation layers

Application developers can extend OGSA-DAI by writing application-specific presentation layers. OGSA-DAI functionality can then be hidden behind application-specific web services. These web services could map their operations to "template" OGSA-DAI workflows. When an operation is invoked by a client its arguments are used to populate the workflow when is then executed.

6.9.4. Extending OGSA-DAI - security

In OGSA-DAI it can be important to control such issues as who can access data resources (or, indeed, any OGSA-DAI resources) and what they can do with these, for example:

  • No access, read-only or read write access.
  • Access to the whole resource or a subset of data therein.
  • Activities that can be executed on the resource.

In OGSA-DAI authentication and authorization are the responsibility of the presentation layer and application developers can implement application-specific authentication and authorization depending upon the OGSA-DAI version, the OGSA-DAI administrator (i.e. the person or project who has deployed an OGSA-DAI server).

6.9.5. Extending OGSA-DAI - database access

One consideration when writing data resources that allow databases to be exposed in OGSA-DAI is how to map presentation layer security information (e.g. security credentials) to database usernames and passwords.

OGSA-DAI has the notion of a "login provider" which developers can use and customise. This is used by OGSA-DAI's relational and XMLDB data resources for example. By default, the OGSA-DAI login provider stores mappings from security credentials to database usernames and passwords in a file located on the OGSA-DAI server. However, application developers could change the login provider to perhaps get this information from a database or via a call-out to a remote mapping service.

Login providers can be dropped into an OGSA-DAI server and used without the need to rebuild OGSA-DAI or understand how OGSA-DAI works as a whole. OGSA-DAI deployers have the choice of using one login provider for all data resources on an OGSA-DAI server (the default configuration) or one login provider per data resource. Application developers, when implementing support for their own databases, can use the login provider API or implement their own solution if desired.

6.10. OGSA-DAI client toolkit

The OGSA-DAI client toolkit is designed to ease the construction of OGSA-DAI clients by providing client-side abstractions of activities, workflows, resources and services. It provides components which contact OGSA-DAI web services, submit the workflows to OGSA-DAI, and parse the request status and data after a workflow has been executed.

This allows application developers to focus on the construction of valid workflows via the connection of objects representing the activities in these workflows and the handling of resulting data which is provided to the application in the form of useful objects (e.g. Java ResultSets for relational data). More generally therefore developers can focus on constructing their applications rather than on the mechanics of interacting with OGSA-DAI.

6.11. OGSA-DAI in action - SEE-GEO

The SEE-GEO (SEcurE access to GEOspatial services) project ( http://edina.ac.uk/projects/seesaw/seegeo/index.html) used many of the features of OGSA-DAI when developing a service to provide access to geo-spatial information on a grid.

As part of a geo-linking interoperability experiment, the project addressed a scenario which involved accessing and joining data from two existing data resources and then rendering this data in a graphical format. The two data resources are as follows:

  • Census statistics: this contains attributes about a region, e.g. the cost of a loaf of bread, and is accesed via a Geo-data access service (GDAS).
  • Borders data: this is data on unique geographical regions encoded as polygons and is accessed via a web feature service (WFS).

By joining data from these data resources questions such as "how does the cost of a loaf of bread vary across different regions in Scotland" can be answered.

The project used OGSA-DAI to develop a geo-linking service which would get data from each of the two data resources and execute a join across these. It would then use the joined data, in conjunction with a feature portrayal service (FPS), to get an image representation of the joined data which could then be returned to the client.

SEE-GEO geo-linking service showing utilisation of application-specific ata resources, services and activities within the OGSA-DAI framework. Clients sharing differaent types of data resource.

Figure 6.18. SEE-GEO geo-linking service constructed using OGSA-DAI


The solution uses a number of features of OGSA-DAI:

  • OGSA-DAI's data resource extensibility points are used to enable the GDAS (and so the census data) and the WFS (and so the borders data) to be accessed within OGSA-DAI workflows.
  • OGSA-DAI's support for application-specific activities allowed application-specific activities for querying the above data resources as well as executing the join across the data resources and also the conversion of the joined data to an image via the FPS to be developed and executed.
  • OGSA-DAI's support for different types of delivery mechanism allowed the image to be delivered via FTP rather than via the more inefficient SOAP over HTTP.

OGSA-DAI gave SEE-GEO many advantages including:

  • The leveraging of OGSA-DAI's workflow execution functionality and out-of-the-box activities for data delivery and interacting with other grid technologies (e.g. GridFTP).
  • The ability to enforce additional levels of security.
  • To utilise other Grid technologies via OGSA-DAI.
  • A toolkit to develop application-specific activities and support for application-specific data resources
  • A framework that allowed them to choose data resources dynamically with very liffle effort.

6.12. Why use OGSA-DAI?

There are many reasons why OGSA-DAI may be suitable for your data access and integration requirements in a grid environment. These include:

  • Fits with grid/web service model.
  • Workflows encapsulate multiple potential Web service interactions into a single interaction.
  • Out-of-the box solution to many data access, transformation, delivery and integration scenarios without the need to develop application-specific activities.
  • Extensive base functionality for data queries, updates, transformation and delivery.
  • Extensible and versatile framework - developers can add or customise capabilities.
  • Platform independence. Runs on any platform that supports Java.
  • Additional security layers can be provided if required. For example authorization can be done at the web service level and/or at the resource level.
  • Supports transparent and opaque data federation.
  • Transparency of database locations and product types.
  • Programming language neutral. OGSA-DAI web services can be accessed from clients written in any language that supports interaction with web services.
  • Plays nicely with other Grid middleware, for example Axis/Tomcat and Globus Toolkit.

6.13. Why might OGSA-DAI NOT be suitable?

OGSA-DAI has been misused, misunderstood and misrepresented. Some users have found that OGSA-DAI has made things worse for their particular applications. In particular, a common comment has been that "OGSA-DAI is not a fast as JDBC".

OGSA-DAI is not intended to be a complete solution to every data-related problem or a replacement for or competitor to JDBC (indeed OGSA-DAI uses JDBC!). The problems in using OGSA-DAI include:

  • Additional indirection to data.
  • The overhead of parsing and executing workflows and going via a web service-based presentation layer incurs a slower access to data than via a either a direct connection to a database or via a dedicated application-specific web service.

In particular, OGSA-DAI may not be suitable if:

  • You have a single data resource that is not going to change.
  • You have no data transformation requirements.
  • You want rapid access to data in a single data resource.


[3] National e-Science Centre, http://www.nesc.ac.uk.

[4] North-East Regional e-Science Centre, http://www.neresc.ac.uk.

[5] The e-Science North-West Centre, http://www.esnw.ac.uk.