home list archives users mailing list

Subject: Re: 2 questions with DAI 3.

Date view Thread view Subject view Author view Attachment view

From: Mike Jackson (michaelj@epcc.ed.ac.uk)
Date: Dec 05, 2007 16:15

Hi Isao,

On Wed, 5 Dec 2007, Isao Kojima wrote:

> We will have a DAI3.0 hands-on tutorial session for japanese ogsadai
> users on this friday and
> we are now making some example programming scenarios. However, we
> encounterd following 2 questions.
>
> 1) SQLBulkloadTuple activity?
>
> One of our scenario includes to create temporal tables and insert the
> data with Bulkload activity.
> The workflow is just to connect SQLquery output to the input of
> Bulkload activity as in the following code fragment.
> -------
> URL serverURL = new
> URL("http://localhost:8080/wsrf/services/dai/");
> ResourceID execResourceID = new
> ResourceID("DataRequestExecutionResource");
> ResourceID dataResourceID = new ResourceID("MySQLDataResource");
> Server server = new ServerProxy();
> server.setDefaultBaseServicesURL(serverURL);
> DataRequestExecutionResource execResource
> =server.getDataRequestExecutionResource(execResourceID);
> SQLQuery query = new SQLQuery();
> query.setResourceID(dataResourceID);
> query.addExpression("SELECT * FROM sample;");
> SQLBulkLoadTuple loadTuples = new SQLBulkLoadTuple();
> loadTuples.setResourceID(dataResourceID);
> loadTuples.addTableName("result");
> loadTuples.connectDataInput(query.getDataOutput());
> DeliverToRequestStatus deliver = new DeliverToRequestStatus();
> deliver.connectInput(loadTuples.getDataOutput());
> PipelineWorkflow workflow = new PipelineWorkflow();
> workflow.add(query);
> workflow.add(loadTuples);
> workflow.add(deliver);
> RequestResource requestResource =
> execResource.execute(workflow,RequestExecutionType.SYNCHRONOUS);
> System.out.println(requestResource.getRequestStatus().getExecutionStatus());
> ------
> Output format of the SQLquery is tuples and the input format of the
> Bulkload activity is also tuples so that I think it is OK
> to connect them. However, it cannot work well and what is wrong for us?

What exactly is going wrong? Is there an error log from the server-side
which records the error?

> 2) Data Integration method between without using FTP/GridFTP?
>
> Next our scenario is to make data integration between two resources.
> In this scenario, we don't use FTP/GridFTP.

> In OGSA-DAI 2.2 we had outputSteram/DeliverfromGDT.combination and
> we want to do the same data processing in 3.0 framework.
>
> Q) What kind of activity/programiing we can do for doing the same operation?

You can do the following:

1-Submit a workflow
   CreateDataSource => DeliverToRequestStatus
   which will create a new data source. CreateDataSource can take the
   ID of the new data source e.g. MyDataSource. If not provided then the
   server will auto-generate an ID for you.

For example:

CreateDataSource createDataSource = new CreateDataSource();
// This is optional.
createDataSource.addResourceID(new ResourceID("MyDataSource"));
DeliverToRequestStatus deliverToRequestStatus = new
DeliverToRequestStatus();
deliverToRequestStatus.connectInput(createDataSource.getResultOutput());

PipelineWorkflow createWorkflow = new PipelineWorkflow();
createWorkflow.add(createDataSource);
createWorkflow.add(deliverToRequestStatus);

RequestResource requestResource = mDRER.execute(createWorkflow,
     RequestExecutionType.SYNCHRONOUS);
RequestStatus status = requestResource.getRequestStatus();
// Get the data source ID - this is needed if you didn't
// specify a data source ID above using
// createDataSource.addResourceID().
ResourceID dataSourceID = createDataSource.nextResult();

2-Submit a workflow like
   SQLQuery => TupleToCSV => WriteToDataSource(MyDataSource)
   which basically gets some data and exposes it via the data source.

For example:

SQLQuery query = new SQLQuery();
query.setResourceID(new ResourceID("MySQLResource"));
query.addExpression("SELECT * FROM bands WHERE id = bangles;");

TupleToCSV transform = new TupleToCSV();
transform.connectDataInput(query.getDataOutput());

WriteToDataSource delivery = new WriteToDataSource();
// Set ID of data source to populate.
delivery.setResourceID(dataSourceID);
delivery.connectInput(transform.getResultOutput());

PipelineWorkflow pipeline = new PipelineWorkflow();
pipeline.add(sQuery);
pipeline.add(transform);
pipeline.add(delivery);
RequestResource requestResource = mDRER.execute(pipeline,
     RequestExecutionType.SYNCHRONOUS);
RequestStatus status = requestResource.getRequestStatus();

CreateDataSource plus WriteToDataSource together are the 3.0 equivalent of
outputStream.

You can then submit a workflow like:

ObtainFromDataSource(URL, DataSourceID, BLOCK, N) => ...

which contacts the named data source at the given service URL and reads N
blocks or

ObtainFromDataSource(URL, DataSourceID, FULL) => ...

which reads all the data blocks.

ObtainFromDataSource is the 3.0 equivalent of DeliverFromGDT.

For example the following runs

ObtainFromDataSource("http://localhost:8080/dai/services/DataSourceService",
"MyDataSource" FULL) => DeliverToRequestStatus

ObtainFromDataSource obtainFromDataSource = new ObtainFromDataSource();
obtainFromDataSource.addURL("http://localhost:8080/dai/services/DataSourceService");
obtainFromDataSource.addResourceID(new ResourceID("MyDataSource"));
obtainFromDataSource.addMode(ModeType.FULL);

DeliverToRequestStatus deliverToStatus = new DeliverToRequestStatus();
deliverToStatus.connectInput(obtainFromDataSource.getDataOutput());

PipelineWorkflow workflow = new PipelineWorkflow();
workflow.add(obtainFromDataSource);
workflow.add(deliverToStatus);
mDRER.execute(workflow, RequestExecutionType.SYNCHRONOUS);

This will also work for cases there the workflow which reads the data from
the data source is executed by the same server which holds the data
source. However the service URL must still be given (in future we hope to
optimise this to exploit the fact that the

Cheers,

mike


Date view Thread view Subject view Author view Attachment view