Subject: Re: How to get a bytea field in OGSA-DAI
- Next message: Wilson Jr.: "Re: How to get a bytea field in OGSA-DAI"
- Previous message: Wilson Jr.: "Re: How to get a bytea field in OGSA-DAI"
- In reply to: Wilson Jr.: "Re: How to get a bytea field in OGSA-DAI"
- Next in thread: Wilson Jr.: "Re: How to get a bytea field in OGSA-DAI"
- Reply: Wilson Jr.: "Re: How to get a bytea field in OGSA-DAI"
From: Ally Hume (a.hume@epcc.ed.ac.uk)
Date: Feb 29, 2008 16:01
Hi,
When an activity has an input called X there as two ways sending data
to this input. If I want to specify a value directly from the client I
can use a method called:
addX( SomeType someObject)
If I want the input to come from the output of another activity I can
connect the
input using the method called:
connectXInput(ActivityOutput)
So when you specified the filename for the DeliverToFTP activity you
set the filename input using the addFilename(). This works fine when
we are handling a single file with each workflow set to OGSA-DAI. But
we want the workflow to handle multiple files. If you know at the
client how many files your query will extract then you could add the
filenames in your client with repeated calls to addFilename(), e.g.
deliverToFTP.addFilename("file1");
deliverToFTP.addFilename("file2");
deliverToFTP.addFilename("file3");
But I was assuming that you do not know how many files your query will
return so you cannot do this. In my example the filenames were being
extracted from the database as part of the database query. The query
asks for the filename and the BLOB. We then used the TupleSplit
activity send the filename down one output and the BLOB down another.
The filename then goes through the ListRemove activity to remove lists
and it could then go into the DeliverToFTP activity. But we do not
connect directlty to the DeliverToFTP activity because we need to use
the stream of filenames to control how many times we repeat the FTP
hostname. Hence we use a ControlledRepeat activity to repeat the
hostname for each filename.
Hopefully the following example will help.
SQLQuery outputs:
ListBegin, TupleMetadata, Tuple("file1",BLOB1), Tuple("file2",BLOB2), ListEnd
This goes to TupleSplit which produces to outputs streams
output1: ListBegin, "file1", "file2", ListEnd
output2: ListBegin, BLOB1, BLOB2, ListEnd
TupleSplit:output1 goes to ListRemove1 which outputs:
output: "file1", "file2"
TupleSplit:output2 then goes into ListRemove2 which outputs:
output: BLOB1, BLOB2
The output from ListRemove1 then goes to ControlledRepeat along with
the input that is the hostname, e.g. "username:password@ftp.host:21".
This produces:
output: "file1", "file2"
repeatedOutput: "username:password@ftp.host:21", "username:password@ftp.host:21"
Finally we have DeliverToFTP that has the following inputs:
host <- ControlledRepeat:repeatedOutput
filename <- ControlledRepeat:output
data <- ListRemove2:output
It hence gets:
host <- "username:password@ftp.host:21", "username:password@ftp.host:21"
filename <- "file1", "file2"
data <- BLOB1, BLOB2
and hence it sends the data in BLOB1 to FTP server
"username:password@ftp.host:21" to be written as a file called "file1"
and then it sends BLOB2 to FTP server "username:password@ftp.host:21"
to be written as a file called "file2".
Does this help you? Do you get an errors back when you execute the
request? Does the request complete successfully? If the request does
not complete successfully the line:
requestResource.pollUntilRequestCompleted(500);
will throw an exception and you can then example the exception or
interogate the RequestResouce object to determine what actually
happened. You may also have something logged server side that may
help.
Ally
On 29/02/2008, Wilson Jr. <wilsonjr@gmail.com> wrote:
> Hi Ally,
>
> thanks, I understood the idea of using these activities.
>
> Well, I did this here and my doubt is: Where here I put the local in GFTP
> where to send the files?
>
>
> DeliverToFTP deliverToFTP = new DeliverToFTP();
> deliverToFTP.connectHostInput(controlledRepeat.getRepeatedOutput());
> deliverToFTP.connectFilenameInput(controlledRepeat.getOutput());
> deliverToFTP.connectDataInput(listRemove2.getOutput());
>
>
> It's processing here the select, but I'm not finding where is sending the
> files back to the file system.
>
> Before, I used: delivery.addFilename(), now I don't know, maybe it's
> sending, at least it's processing,
> but I can't find the files.
>
> cheers.
>
>
>
> On Wed, Feb 27, 2008 at 1:59 PM, Ally Hume <a.hume@epcc.ed.ac.uk> wrote:
>
> > Hi,
> >
> > Firstly, you can extract multiple images for a database and send them
> > to an FTP or GridFTP server with one workflow sent to OGSA-DAI. The
> > code here shows how this is done from the client toolkit.
> >
> > DataSourceResource dataSource = mDRER.createDataSourceResource();
> > System.out.println("ResourceID is: " + dataSource.getResourceID());
> >
> > SQLQuery query = new SQLQuery();
> > query.setResourceID(mSystemTestConfig.getRelationalResourceID());
> > query.addExpression("select Name, Image from images where idpic <
> > 4");
> >
> > TupleSplit tupleSplit = new TupleSplit();
> > tupleSplit.connectDataInput(query.getDataOutput());
> > tupleSplit.setNumberOfResultOutputs(2);
> >
> > ListRemove listRemove1 = new ListRemove();
> > listRemove1.connectInput(tupleSplit.getResultOutput(0));
> >
> > ListRemove listRemove2 = new ListRemove();
> > listRemove2.connectInput(tupleSplit.getResultOutput(1));
> >
> > ControlledRepeat controlledRepeat = new ControlledRepeat();
> > controlledRepeat.addRepeatedInput("username:password@ftp.host:21");
> > controlledRepeat.connectInput(listRemove1.getOutput());
> >
> > DeliverToFTP deliverToFTP = new DeliverToFTP();
> > deliverToFTP.connectHostInput(controlledRepeat.getRepeatedOutput
> > ());
> > deliverToFTP.connectFilenameInput(controlledRepeat.getOutput());
> > deliverToFTP.connectDataInput(listRemove2.getOutput());
> >
> > // Build the workflow
> > PipelineWorkflow pipeline = new PipelineWorkflow();
> > pipeline.add(query);
> > pipeline.add(listRemove1);
> > pipeline.add(listRemove2);
> > pipeline.add(tupleSplit);
> > pipeline.add(controlledRepeat);
> > pipeline.add(deliverToFTP);
> >
> > RequestResource requestResource =
> > mDRER.execute(pipeline, RequestExecutionType.ASYNCHRONOUS);
> >
> > requestResource.pollUntilRequestCompleted(500);
> >
> > System.out.println("Done");
> >
> > Where mDRER is a DataRequestExecutionResource object. This
> > successfully extracts names and images from an images table in my
> > database. The binary image data is then sent to the ftp server using
> > the corresponding entry in the Name column as the name of the
> > resulting file.
> >
> > A brief explaination of what each activity is doing..
> > SQLQuery - the initial SQL query that produces results with two
> > columns, name in the first column and image data in the second. The
> > output here is a list of tuples.
> >
> > TupleSplit - this takes as input a list of tuples. it has one output
> > for each column of the input tuples. In this case it will have two
> > outputs, output 0 will contain a list of strings containing the
> > filename, output 2 will contain a list of BLOBS.
> >
> > ListRemove - In this case we are not interested in the lists so we
> > pass both outputs through a list remove activity to remove the list
> > markers.
> >
> > DeliverToFTP - this activity sends data to an FTP server. It takes as
> > input the binary data (an OGSA-DAI BLOB object works fine), a filename
> > and details of the host. It will read one filename, one host and then
> > one binary stream and will send all the data in that binary stream to
> > the FTP server. It will then try to read another filename and another
> > host and another binary stream and will send that steam to the FTP
> > server. It will repeat doing this until there is no more data.
> >
> > Because the DeliverToFTP activity needs to read an FTP host for each
> > file we have to use the ControlledRepeat activity to repeat the
> > constant FTP host details for each filename.
> >
> > I hope this makes some sort of sence. The workflow gets a little bit
> > complex but shows well the type of thing you can do with OGSA-DAI 3.0.
> > In previous versions of OGSA-DAI you could not do this in one
> > workflow.
> >
> > Please ask any questions if you do not understand what is happening
> > here. I will try to find time to write this up in more detail and
> > post it on the OGSA-DAI website.
> >
> > You should be able to replace the SQLQuery activity with SQLBag and
> > DeliverToFTP with DeliverToGFTP easily enough.
> >
> >
> > Now to your second approach of trying to not use GFTP but get the
> > response in the SOAP message. I think using GFTP is the best approach
> > but if you want to get the data back in the SOAP response you can try
> > converting to webrowset rather than CSV format. I think this will
> > encode the data in base64 format for you. Replace the TupleToCSV
> > activity with the TupleToWebRowSetCharArrays activity. You should be
> > able to get the data from the ResultSet object in the same way you
> > normally get OGSA-DAI relational data results at the client toolkit.
> > This approach is not really advised for anything other than really
> > trivial amounts of data.
> >
> > I would recommend you try the first approach using one OGSA-DAI
> > workflow to send all the images to your GFTP server. Please let me
> > know how you get on with this as I would be interested to know if this
> > fixed your problem. I'm hopeful it will.
> >
> > Regards,
> >
> > Ally Hume
> > OGSA-DAI Development Team
> >
> > On 27/02/2008, Wilson Jr. <wilsonjr@gmail.com> wrote:
> > > Hi folks,
> > >
> > >
> > > First, I'd like to thank Elias to answer me the problem I related as a
> > > Timeout in SQLBag,
> > > sorry the long time to answer, but I was on vacation and just gave a
> > break
> > > on my project, but now I'm returning on it.
> > >
> > > In fact, before I thought it was SQLBag Timeout, but it was timeout
> > > connection between my Client and my
> > > Gridservice. I changed the timeout in the Globus Container, and it
> > worked
> > > ok, but the long time is still
> > > a problem.
> > >
> > > Well, let me explain what's happening:
> > >
> > > I do a Select to bring some informations about a group of people,
> > associated
> > > to each person
> > > there's a bytea field that stores a file with ~6 KB at maximum.
> > > I need to bring the file stored in db to the file system, compute it,
> > and
> > > then
> > > with the results, I'll use only the informations from some people from
> > the
> > > group selected before.
> > >
> > > In the second Select I do this to bring the files to the file system:
> > > SQLBag -> TupleSplit -> ListRemove -> DeliverToGFTP
> > >
> > > The question is that It's too slow to bring the files, for example:
> > > I had to bring: 2971,978515625 KB(2,90 MB) and the time was 660,647
> > > seconds(11,01 min), or be: 4,49 KB/sec
> > > I don't know exactly why is too slow, what I suppose is:
> > >
> > > DeliveryToGFTP only save one file per time, right? and then to bring
> > the
> > > file for each people
> > > I had to use a loop with FOR, and for each person do: *
> > query.addExpression
> > > *(...).
> > > The fact that there's a SELECT for each person, in the test above(1023
> > > files),
> > > increase a lot the time?
> > >
> > > What I can do to decrease this time?
> > >
> > > I though in not use the DeliveryToGFTP and bring the bytea field with
> > the
> > > other informations
> > > and then using the Java, save the file in the file system. Well, to do
> > this
> > > instead of use two
> > > phases of SELECT like before (One for the infomations, and the others
> > to
> > > bring the files),
> > > in the first SELECT add the byte field, well, I did this, but I'm
> > having
> > > this problem, that I'm not
> > > knowing how to resolve.
> > >
> > > The first select it's from this way:
> > > SQLBag -> TupleToCSV -> DeliverToRequestStatus
> > >
> > > The problem is that after execute the Select, I get the data from this
> > form:
> > > while( transform.hasNextResult() ){
> > > BufferedReader bufReader = new BufferedReader(
> > > transform.nextResult());
> > >
> > > String resultado = null;
> > > while( (resultado = bufReader.readLine()) != null) {
> > > StringTokenizer tokenizer = new
> > > StringTokenizer(resultado,"#");
> > > String rg = tokenizer.nextToken().trim();
> > > ...
> > > ...
> > > ...
> > >
> > > The question is that the field is bytea, how I can obtain the data
> > > to save the file via Java? This way Obviously I get only:
> > BLOB(length=XXXXX)
> > >
> > > What I can do to get this field in a binary form to save like a file in
> > the
> > > file system?
> > >
> > > The form to try to decrease the time I'm thinking is this,
> > > bring to Java and save the file, instead of using the
> > DeliveryToGFTP(the lot
> > > of SELECTS).
> > >
> > >
> > > Cheers.
> > >
> > >
> > > --
> > > "É este um mundo no qual devemos esconder nossas virtudes?"
> > > Willian Shakespeare
> > >
> > >
> > > ****************
> > > Wilson Júnior
> > > ****************
> > >
> >
>
>
>
>
> --
>
> "É este um mundo no qual devemos esconder nossas virtudes?"
> Willian Shakespeare
>
>
> ****************
> Wilson Júnior
> ****************
>
-- ---------------------------------------------------------- Ally Hume Software Architect EPCC, The University of Edinburgh Tel: +44 131 651 3397
- Next message: Wilson Jr.: "Re: How to get a bytea field in OGSA-DAI"
- Previous message: Wilson Jr.: "Re: How to get a bytea field in OGSA-DAI"
- In reply to: Wilson Jr.: "Re: How to get a bytea field in OGSA-DAI"
- Next in thread: Wilson Jr.: "Re: How to get a bytea field in OGSA-DAI"
- Reply: Wilson Jr.: "Re: How to get a bytea field in OGSA-DAI"