home list archives users mailing list

Subject: Re: How to get a bytea field in OGSA-DAI

Date view Thread view Subject view Author view Attachment view

From: Ally Hume (a.hume@epcc.ed.ac.uk)
Date: Feb 27, 2008 16:59

Hi,

Firstly, you can extract multiple images for a database and send them
to an FTP or GridFTP server with one workflow sent to OGSA-DAI. The
code here shows how this is done from the client toolkit.

        DataSourceResource dataSource = mDRER.createDataSourceResource();
        System.out.println("ResourceID is: " + dataSource.getResourceID());

        SQLQuery query = new SQLQuery();
        query.setResourceID(mSystemTestConfig.getRelationalResourceID());
        query.addExpression("select Name, Image from images where idpic < 4");

        TupleSplit tupleSplit = new TupleSplit();
        tupleSplit.connectDataInput(query.getDataOutput());
        tupleSplit.setNumberOfResultOutputs(2);

        ListRemove listRemove1 = new ListRemove();
        listRemove1.connectInput(tupleSplit.getResultOutput(0));

        ListRemove listRemove2 = new ListRemove();
        listRemove2.connectInput(tupleSplit.getResultOutput(1));

        ControlledRepeat controlledRepeat = new ControlledRepeat();
        controlledRepeat.addRepeatedInput("username:password@ftp.host:21");
        controlledRepeat.connectInput(listRemove1.getOutput());

        DeliverToFTP deliverToFTP = new DeliverToFTP();
        deliverToFTP.connectHostInput(controlledRepeat.getRepeatedOutput());
        deliverToFTP.connectFilenameInput(controlledRepeat.getOutput());
        deliverToFTP.connectDataInput(listRemove2.getOutput());

        // Build the workflow
        PipelineWorkflow pipeline = new PipelineWorkflow();
        pipeline.add(query);
        pipeline.add(listRemove1);
        pipeline.add(listRemove2);
        pipeline.add(tupleSplit);
        pipeline.add(controlledRepeat);
        pipeline.add(deliverToFTP);

        RequestResource requestResource =
            mDRER.execute(pipeline, RequestExecutionType.ASYNCHRONOUS);

        requestResource.pollUntilRequestCompleted(500);

        System.out.println("Done");

Where mDRER is a DataRequestExecutionResource object. This
successfully extracts names and images from an images table in my
database. The binary image data is then sent to the ftp server using
the corresponding entry in the Name column as the name of the
resulting file.

A brief explaination of what each activity is doing..
SQLQuery - the initial SQL query that produces results with two
columns, name in the first column and image data in the second. The
output here is a list of tuples.

TupleSplit - this takes as input a list of tuples. it has one output
for each column of the input tuples. In this case it will have two
outputs, output 0 will contain a list of strings containing the
filename, output 2 will contain a list of BLOBS.

ListRemove - In this case we are not interested in the lists so we
pass both outputs through a list remove activity to remove the list
markers.

DeliverToFTP - this activity sends data to an FTP server. It takes as
input the binary data (an OGSA-DAI BLOB object works fine), a filename
and details of the host. It will read one filename, one host and then
one binary stream and will send all the data in that binary stream to
the FTP server. It will then try to read another filename and another
host and another binary stream and will send that steam to the FTP
server. It will repeat doing this until there is no more data.

Because the DeliverToFTP activity needs to read an FTP host for each
file we have to use the ControlledRepeat activity to repeat the
constant FTP host details for each filename.

I hope this makes some sort of sence. The workflow gets a little bit
complex but shows well the type of thing you can do with OGSA-DAI 3.0.
 In previous versions of OGSA-DAI you could not do this in one
workflow.

Please ask any questions if you do not understand what is happening
here. I will try to find time to write this up in more detail and
post it on the OGSA-DAI website.

You should be able to replace the SQLQuery activity with SQLBag and
DeliverToFTP with DeliverToGFTP easily enough.

Now to your second approach of trying to not use GFTP but get the
response in the SOAP message. I think using GFTP is the best approach
but if you want to get the data back in the SOAP response you can try
converting to webrowset rather than CSV format. I think this will
encode the data in base64 format for you. Replace the TupleToCSV
activity with the TupleToWebRowSetCharArrays activity. You should be
able to get the data from the ResultSet object in the same way you
normally get OGSA-DAI relational data results at the client toolkit.
This approach is not really advised for anything other than really
trivial amounts of data.

I would recommend you try the first approach using one OGSA-DAI
workflow to send all the images to your GFTP server. Please let me
know how you get on with this as I would be interested to know if this
fixed your problem. I'm hopeful it will.

Regards,

Ally Hume
OGSA-DAI Development Team

On 27/02/2008, Wilson Jr. <wilsonjr@gmail.com> wrote:
> Hi folks,
>
>
> First, I'd like to thank Elias to answer me the problem I related as a
> Timeout in SQLBag,
> sorry the long time to answer, but I was on vacation and just gave a break
> on my project, but now I'm returning on it.
>
> In fact, before I thought it was SQLBag Timeout, but it was timeout
> connection between my Client and my
> Gridservice. I changed the timeout in the Globus Container, and it worked
> ok, but the long time is still
> a problem.
>
> Well, let me explain what's happening:
>
> I do a Select to bring some informations about a group of people, associated
> to each person
> there's a bytea field that stores a file with ~6 KB at maximum.
> I need to bring the file stored in db to the file system, compute it, and
> then
> with the results, I'll use only the informations from some people from the
> group selected before.
>
> In the second Select I do this to bring the files to the file system:
> SQLBag -> TupleSplit -> ListRemove -> DeliverToGFTP
>
> The question is that It's too slow to bring the files, for example:
> I had to bring: 2971,978515625 KB(2,90 MB) and the time was 660,647
> seconds(11,01 min), or be: 4,49 KB/sec
> I don't know exactly why is too slow, what I suppose is:
>
> DeliveryToGFTP only save one file per time, right? and then to bring the
> file for each people
> I had to use a loop with FOR, and for each person do: *query.addExpression
> *(...).
> The fact that there's a SELECT for each person, in the test above(1023
> files),
> increase a lot the time?
>
> What I can do to decrease this time?
>
> I though in not use the DeliveryToGFTP and bring the bytea field with the
> other informations
> and then using the Java, save the file in the file system. Well, to do this
> instead of use two
> phases of SELECT like before (One for the infomations, and the others to
> bring the files),
> in the first SELECT add the byte field, well, I did this, but I'm having
> this problem, that I'm not
> knowing how to resolve.
>
> The first select it's from this way:
> SQLBag -> TupleToCSV -> DeliverToRequestStatus
>
> The problem is that after execute the Select, I get the data from this form:
> while( transform.hasNextResult() ){
> BufferedReader bufReader = new BufferedReader(
> transform.nextResult());
>
> String resultado = null;
> while( (resultado = bufReader.readLine()) != null) {
> StringTokenizer tokenizer = new
> StringTokenizer(resultado,"#");
> String rg = tokenizer.nextToken().trim();
> ...
> ...
> ...
>
> The question is that the field is bytea, how I can obtain the data
> to save the file via Java? This way Obviously I get only: BLOB(length=XXXXX)
>
> What I can do to get this field in a binary form to save like a file in the
> file system?
>
> The form to try to decrease the time I'm thinking is this,
> bring to Java and save the file, instead of using the DeliveryToGFTP(the lot
> of SELECTS).
>
>
> Cheers.
>
>
> --
> "É este um mundo no qual devemos esconder nossas virtudes?"
> Willian Shakespeare
>
>
> ****************
> Wilson Júnior
> ****************
>


Date view Thread view Subject view Author view Attachment view