Chapter 7. OGSA-DAI data access and integration scenarios

7.1. Introduction
7.2. Building block activities
7.3. Targetting an SQL query at multiple data resources using a resource group
7.4. SQL Query with delivery to an FTP server
7.5. SQL query returning BLOBs with third party delivery
7.6. Relational database to database copy using SQL updates
7.7. Relational database to database copy using SQL bulk load
7.8. Inter-FTP server transfer

7.1. Introduction

This page introduces standard data access and integration scenarios and provides examples of how they can be realised using OGSA-DAI workflows. This is not a complete set of scenarios that can be realized but are intended to provide a general idea as to what problems OGSA-DAI can solve.

7.2. Building block activities

The following activities can prove to be useful in realizing data access and integration scenarios when using OGSA-DAI.

The ListRemove activity takes in a list of values and outputs the members of the list.

The ControlledRepeat activity repeats one of its inputs for each value received on its other input.

ListRemove activity which removes ListBegin and ListEnd block markers and so reduces the granularity of a list. ControlledRepeat activity which repeats inputs.

Figure 7.1. ListRemove and ControlledRepeat activities


TupleSplit splits a tuple into its constituent column values. The number of outputs of TupleSplit is equal to the number of columns in the input tuples.

Tee clones its inputs.

TupleSplit activity which slices tuples into their constituent members. Tee activity which clones its inputs.

Figure 7.2. TupleSplit and Tee activities


DeliverToRequestStatus streams data from another activity into the request status. A client can then extract this data from the request status.

DeliverToRequestStatus activity which streams data into the request status.

Figure 7.3. DeliverToRequestStatus activity


7.3.  Targetting an SQL query at multiple data resources using a resource group

This scenario explains how to simultaneously execute an SQL query across multiple databases using a resource group. A resource group is an OGSA-DAI data resource which represents a set of child resources which are usually hidden from the client to whom the group appears as a single resource. Two activities are currently supported by a resource group: SQLBag and SQLResilient. These assume that the resources in the group are OGSA-DAI relational data resources. Each execute the same SQL query across the data resources inthe group. This will only be successful provided the databases have similar schemas. For example, a SELECT query expects the same table and column names to be present in the database schema of each database in the group.

[Note]Note
The input SQL expression is executed verbatim on each database and no schema transformations are applied.

The same SQL query is executed on each database in the group and the results are then collected. The SQLBag activity collects all available query results into a single result set (which may include duplicate rows). SQLResilient returns only one set of results, as provided by the database which is first to respond. Both activities are robust with respect to databases being down and will not fail if any databases in the group are not accessible at the time of the query.

To execute the scenario, a resource group must be created which contains all resources that the client wishes to query. A resource group is created using the CreateResourceGroup activity - which takes the IDS of the member resources and the ID of the resource group to be created as inputs. Once the resource group has been created it can be used within a workflow. It behaves like any OGSA-DAI data resource and can be targetted by the activities that it supports.

To a client a resource group will look like a single resource and they may or may not be aware of the child resources contained in the group - this depends on whether the client executed the CreateResourceGroup activity or the resource group was created by some other client.

Both the SQLBag and the SQLResilient activity take the SQL expression as a parameter which is then executed across all resources in the group and the results processed as specified in the request.

The following figure shows both the CreateResourceGroup activity and also the SQLBag activity.

CreateResourceGroup creates a resource group. SQLBag uses a resource group to submit an SQL query to each resource in the resource group.

Figure 7.4. Creating and using a resource group


7.4. SQL Query with delivery to an FTP server

In this scenario an SQL query is executed, the results from the query are converted into CSV format and delivered to an FTP server. An SQLQuery activity takes an SQL query expression as a parameter and produces a list of tuples as output, containing the results of the database query.

SQLQuery queries a database and the tuples are converted via TupleToCSV to CSV values before being delivered to an FTP server via DeliverToFTP.

Figure 7.5.  Workflow which delivers the results of an SQL query in CSV format to an FTP server.


The output of the SQLQuery activity is streamed to a TupleToCSV activity. TupleToCSV writes the contents of each tuple to a string of comma-separated values. The user may provide a string value for NULL data fields.

The last activity in the workflow, DeliverToFTP, delivers character or binary data to an FTP server. The input parameters to this activity are the URL of the remote FTP server (including the username and password if required) and the filename of the remote data file to be written. Its data input - from which the contents of the file are streamed - is connected to the output of the TupleToCSV activity. In this example, DeliverToFTP writes a file myFile to the FTP server at my.ftp.server:21.

7.5. SQL query returning BLOBs with third party delivery

In this scenario an SQL query is executed that returns BLOBs (binary large objects) from a database and their corresponding labels. Each BLOB is delivered to a specified FTP server and stored using the label as the filename.

The SQLQuery activity produces a list of tuples where each tuple contains a BLOB and a label. Each tuple needs to be split into its constituent parts, i.e. into the label names and the BLOBs. Therefore the output tuples from the SQLQuery activity are passed through a TupleSplit activity which splits tuples into their corresponding elements. The output of this activity is a list of label names and a list of BLOBs. The lists are removed by streaming these through the ListRemove activity and the outputs (i.e. each label and BLOB) together with a repeated literal specifying the URL of the FTP server are passed to the DeliverToFTP activity.

SQLQuery pulls BLOBs and labels from a database, TupleSplit and ListRemove are used to extract the individual BLOBs and labels (which and these are passed into DeliverToFTP - each BLOB being stored in a file named after the label. A repeated literal is used to provide the URL of the FTP server host.

Figure 7.6. Querying a database and delivering BLOBs to an FTP server


7.6. Relational database to database copy using SQL updates

In this scenario a data resource is queried and the results of the query are inserted into a second database.

The SQLParameterisedUpdate activity prepares an SQL update statement with parameters (? in SQL), for example INSERT INTO table VALUES (?, ?) and executes this statement multiple times with different parameters. The values to be placed into the placeholders are provided as an input to the activity as tuples in which the n-th object in each tuple is placed into the n-th parameter of the SQL update statement. The SQL update statement is executed for each tuple provided. The number of updates is returned in the request status using the DeliverToRequestStatus activity.

SQLQuery queries a database and the tuples are used as parameters for an SQLParameterisedUpdate statement in conjunction with an SQL insert to insert the tuples into another database.

Figure 7.7. Copying data from one database to another using SQL queries and updates


7.7. Relational database to database copy using SQL bulk load

An alternative scenario for database to database copy is as follows. An SQLBulkLoadTuple activity is used to bulk load the tuples from an SQLQuery activity into a named table of another database. The number of updates is returned in the request status using the DeliverToRequestStatus activity.

SQLQuery queries a database and the tuples are input into an SQLBulkLoad activity to insert the tuples into another database.

Figure 7.8. Copying data from one database to another via bulk load


7.8. Inter-FTP server transfer

In this scenario files are transferred from one FTP server to another. The names of the files to transfer are accessed from a database via an SQLQuery activity (but one could envisage scenarios where they're provided as input literals in the workflow). A Tee activity is used to clone the file names to pass them to both an ObtainFromFTP activity - which reads the source files - and a DeliverToFTP activity - which writes the destination files.

SQLQuery is used to pull file names from a database. These are passed via a Tee to both an ObtainFromFTP activity to load each file and a DeliverToFTP activity to deliver the files to the destination FTP server.

Figure 7.9. Inter-FTP server data transfer