On Thu, May 28, 2009 at 14:03, Silvia Giuliani
<[email protected]> wrote:
> Dear all
>
> We have constructed a workflow for analysing microarray data using Soaplab
> web services (workflow attached). The idea is that the user supplies a
> number of affymetrix CEL files (about 30Mb)  via Taverna which are then sent
>  to the web services for analysis. Our problem though is that we rapidly run
> out of memory (Java Heap space) when two or more files are supplied. We have
> assigned more memory to Java, and this helps, but clearly we are a long way
> from our goal of being able to analyse tens of files. The solution would be
> use to references instead of loading the files into memory but we cant find
> anything in the manual that shows us how to do this. Any clues?


Have you tried this workflow in the latest 2.1b1 ? See
http://www.myexperiment.org/packs/60 - this version should be more
memory efficient as it dumps large data to a database stored on disk.


As to using references, this would require changing the services to
work with references instead of (or in addition to) the full data.

The easiest way to do this is to accept a URL instead of the real data, like
  http://myservice/outputs/ed73c5d4-717f-4ddd-8263-6791aa85c07c.xml

If the service also returns data in this way, pipelining between such
services means that the big data is never transfered back or forth to
Taverna. If the service is clever, it can even recognize that it's one
of it's 'own' URLs and just access the file directly without any
downloading. There could be a kind of 'upload' method for getting
started with the first service inputs, but it should check somehow
that the data is real (as expected by the service) to avoid abuse of
the service by people who like to share illegal and obscene stuff.

Instead of a URL you could use some internal identification scheme,
but note that then your references would not work with other services
doing the same trick, and you would have to provide some kind of
download method.


I would recommend making 'mirrored' methods or services for supporting
references in case you imagine clients who would not need the
references.


There's unfortunately not any agreed upon service standard for saying
that a reference is in place of the 'real' data - so currently you
would have to introduce a shim into the workflow that converts the URL
to a T2reference if you are connecting the output to another service
or want to see it locally. Such a beanshell script can be as easy as:

  output = new URL(input);


-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
taverna-users mailing list
[email protected]
[email protected]
Web site: http://www.taverna.org.uk
Mailing lists: http://www.taverna.org.uk/taverna-mailing-lists/

Reply via email to