This page introduces standard data access and integration scenarios and provides examples of how they can be realised using OGSA-DAI workflows. This is not a complete set of scenarios that can be realized but are intended to provide a general idea as to what problems OGSA-DAI can solve.
The following activities can prove to be useful in realizing data access and integration scenarios when using OGSA-DAI.
The ListRemove activity takes in a list of values and outputs the members of the list.
The ControlledRepeat activity repeats one of its inputs for each value received on its other input.
TupleSplit splits a tuple into its constituent column values. The number of outputs of TupleSplit is equal to the number of columns in the input tuples.
Tee clones its inputs.
DeliverToRequestStatus streams data from another activity into the request status. A client can then extract this data from the request status.
This scenario explains how to simultaneously execute an SQL query across multiple databases using a resource group. A resource group is an OGSA-DAI data resource which represents a set of child resources which are usually hidden from the client to whom the group appears as a single resource. Two activities are currently supported by a resource group: SQLBag and SQLResilient. These assume that the resources in the group are OGSA-DAI relational data resources. Each execute the same SQL query across the data resources inthe group. This will only be successful provided the databases have similar schemas. For example, a SELECT query expects the same table and column names to be present in the database schema of each database in the group.
![]() | Note |
|---|---|
| The input SQL expression is executed verbatim on each database and no schema transformations are applied. |
The same SQL query is executed on each database in the group and the results are then collected. The SQLBag activity collects all available query results into a single result set (which may include duplicate rows). SQLResilient returns only one set of results, as provided by the database which is first to respond. Both activities are robust with respect to databases being down and will not fail if any databases in the group are not accessible at the time of the query.
To execute the scenario, a resource group must be created which contains all resources that the client wishes to query. A resource group is created using the CreateResourceGroup activity - which takes the IDS of the member resources and the ID of the resource group to be created as inputs. Once the resource group has been created it can be used within a workflow. It behaves like any OGSA-DAI data resource and can be targetted by the activities that it supports.
To a client a resource group will look like a single resource and they may or may not be aware of the child resources contained in the group - this depends on whether the client executed the CreateResourceGroup activity or the resource group was created by some other client.
Both the SQLBag and the SQLResilient activity take the SQL expression as a parameter which is then executed across all resources in the group and the results processed as specified in the request.
The following figure shows both the CreateResourceGroup activity and also the SQLBag activity.
In this scenario an SQL query is executed, the results from the query are converted into CSV format and delivered to an FTP server. An SQLQuery activity takes an SQL query expression as a parameter and produces a list of tuples as output, containing the results of the database query.
The output of the SQLQuery activity is streamed to a TupleToCSV activity. TupleToCSV writes the contents of each tuple to a string of comma-separated values. The user may provide a string value for NULL data fields.
The last activity in the workflow, DeliverToFTP, delivers character or
binary data to an FTP server. The input parameters to this activity
are the URL of the remote FTP server (including the username and
password if required) and the filename of the remote data file to be
written. Its data input - from which the contents of the file are
streamed - is connected to the output of the TupleToCSV activity. In
this example, DeliverToFTP writes a file
myFile to the FTP server at
my.ftp.server:21.
In this scenario an SQL query is executed that returns BLOBs (binary large objects) from a database and their corresponding labels. Each BLOB is delivered to a specified FTP server and stored using the label as the filename.
The SQLQuery activity produces a list of tuples where each tuple contains a BLOB and a label. Each tuple needs to be split into its constituent parts, i.e. into the label names and the BLOBs. Therefore the output tuples from the SQLQuery activity are passed through a TupleSplit activity which splits tuples into their corresponding elements. The output of this activity is a list of label names and a list of BLOBs. The lists are removed by streaming these through the ListRemove activity and the outputs (i.e. each label and BLOB) together with a repeated literal specifying the URL of the FTP server are passed to the DeliverToFTP activity.
In this scenario a data resource is queried and the results of the query are inserted into a second database.
The SQLParameterisedUpdate activity prepares an SQL update statement
with parameters (? in SQL), for
example INSERT INTO table VALUES (?,
?) and executes this statement multiple times with
different parameters. The values to be placed into the placeholders
are provided as an input to the activity as tuples in which the n-th
object in each tuple is placed into the n-th parameter of the SQL
update statement. The SQL update statement is executed for each tuple
provided. The number of updates is returned in the request status
using the DeliverToRequestStatus activity.
An alternative scenario for database to database copy is as follows. An SQLBulkLoadTuple activity is used to bulk load the tuples from an SQLQuery activity into a named table of another database. The number of updates is returned in the request status using the DeliverToRequestStatus activity.
In this scenario files are transferred from one FTP server to another. The names of the files to transfer are accessed from a database via an SQLQuery activity (but one could envisage scenarios where they're provided as input literals in the workflow). A Tee activity is used to clone the file names to pass them to both an ObtainFromFTP activity - which reads the source files - and a DeliverToFTP activity - which writes the destination files.