On Thu, Sep 23, 2010 at 17:23, Gorissen D. <[email protected]> wrote:

> I came across an old mailing list post while searching for info about whether 
> it is possible to parallelize the workflow execution somehow, preferably 
> internally (parallelize the different activities, vs different execution 
> engines)(*).  In the post there is this snippit(**):
> (..)
> I am aware of Tavernas web-service bias, but assumed it would also work fine 
> for long running processes (think many hours).  This snippit made me wonder 
> if certain design choices or roadmap plans would make Taverna less ideal for 
> such long running workflows?

If you are designing web services that will take a long time to
execute, you are advised to build them in an asynchronous pattern.

The reason for this is that a single SOAP or REST call is done as a
single HTTP connection, which could easily fail for anything taking
more than say 5 minutes. (Our default timeout). You can tweak
Taverna's SOAP timeout by setting a system property
-Dtaverna.wsdl.timeout=1000 (1000 minutes) - but a high value here is
only going to be stable on http://localhost or on a locally switched
network.


See http://www.mygrid.org.uk/dev/wiki/display/scrap/Looping+in+Taverna+2.1
for how to access asynchronous services using looping in Taverna.

The typical pattern is that you expose three operations: createJob(),
checkStatus() and getResults():

createJob(input1, input2, input3) --> jobId=123123

checkStatus(jobId=123123) --> status=pending
checkStatus(jobId=123123) --> status=running
checkStatus(jobId=123123) --> status=running
checkStatus(jobId=123123) --> status=complete

getResults(jobId=123123) --> output1, output2, ...


In Taverna you can then set a looping control on checkStatus to keep
running (with a delay) "until the status is 'complete'" - or if you
have a separate 'failed' condition and no 'pending': "until the status
is not 'running'".

You can then also increase the retries to each of these services to 3,
which should make it more resistant for network flukes.


A similar pattern can be done for REST services if using the RESTful
plugin for Taverna. You would then typically have:

POST /analysis/
Content-Type: application/xml

<some><inputs></some>

-->
HTTP/1.1 201 Created
Location: http://host/analysis/79cd0506-6810-4cd2-8f3a-8f30be6351ab


GET /analysis/79cd0506-6810-4cd2-8f3a-8f30be6351ab

HTTP/1.1 200 OK
Content-Type: application/xml
<analysis>
   <status>running<status>
</analysis>

GET /analysis/79cd0506-6810-4cd2-8f3a-8f30be6351ab

(For looping this you could use the regular-expression matching to
avoid having to match the whole XML or use nested workflows with
splitters)


HTTP/1.1 200 OK
Content-Type: application/xml
<analysis>
   <status>complete<status>
   
<outputs>http://host/analysis/79cd0506-6810-4cd2-8f3a-8f30be6351ab/outputs</outputs>
</analysis>


GET /analysis/79cd0506-6810-4cd2-8f3a-8f30be6351ab/outputs

HTTP/1.1 200 OK
Content-Type: application/xml
<output>
  <blah>..</blah>
  <fish>..</fish>
</output>



As for services not running through web services there are different
options available. For instance the UseCase activity allow you to
execute remote SSH commands, but as this depends on a single SSH
connection you could run into the same timeout issue here, and would
need to split your server-side scripts into asynchrounous jobs
(started in the background with &)


-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
taverna-users mailing list
[email protected]
[email protected]
Web site: http://www.taverna.org.uk
Mailing lists: http://www.taverna.org.uk/about/contact-us/

Reply via email to