On Thu, Sep 23, 2010 at 17:23, Gorissen D. <[email protected]> wrote:
> I came across an old mailing list post while searching for info about whether > it is possible to parallelize the workflow execution somehow, preferably > internally (parallelize the different activities, vs different execution > engines)(*). In the post there is this snippit(**): > (..) > I am aware of Tavernas web-service bias, but assumed it would also work fine > for long running processes (think many hours). This snippit made me wonder > if certain design choices or roadmap plans would make Taverna less ideal for > such long running workflows? If you are designing web services that will take a long time to execute, you are advised to build them in an asynchronous pattern. The reason for this is that a single SOAP or REST call is done as a single HTTP connection, which could easily fail for anything taking more than say 5 minutes. (Our default timeout). You can tweak Taverna's SOAP timeout by setting a system property -Dtaverna.wsdl.timeout=1000 (1000 minutes) - but a high value here is only going to be stable on http://localhost or on a locally switched network. See http://www.mygrid.org.uk/dev/wiki/display/scrap/Looping+in+Taverna+2.1 for how to access asynchronous services using looping in Taverna. The typical pattern is that you expose three operations: createJob(), checkStatus() and getResults(): createJob(input1, input2, input3) --> jobId=123123 checkStatus(jobId=123123) --> status=pending checkStatus(jobId=123123) --> status=running checkStatus(jobId=123123) --> status=running checkStatus(jobId=123123) --> status=complete getResults(jobId=123123) --> output1, output2, ... In Taverna you can then set a looping control on checkStatus to keep running (with a delay) "until the status is 'complete'" - or if you have a separate 'failed' condition and no 'pending': "until the status is not 'running'". You can then also increase the retries to each of these services to 3, which should make it more resistant for network flukes. A similar pattern can be done for REST services if using the RESTful plugin for Taverna. You would then typically have: POST /analysis/ Content-Type: application/xml <some><inputs></some> --> HTTP/1.1 201 Created Location: http://host/analysis/79cd0506-6810-4cd2-8f3a-8f30be6351ab GET /analysis/79cd0506-6810-4cd2-8f3a-8f30be6351ab HTTP/1.1 200 OK Content-Type: application/xml <analysis> <status>running<status> </analysis> GET /analysis/79cd0506-6810-4cd2-8f3a-8f30be6351ab (For looping this you could use the regular-expression matching to avoid having to match the whole XML or use nested workflows with splitters) HTTP/1.1 200 OK Content-Type: application/xml <analysis> <status>complete<status> <outputs>http://host/analysis/79cd0506-6810-4cd2-8f3a-8f30be6351ab/outputs</outputs> </analysis> GET /analysis/79cd0506-6810-4cd2-8f3a-8f30be6351ab/outputs HTTP/1.1 200 OK Content-Type: application/xml <output> <blah>..</blah> <fish>..</fish> </output> As for services not running through web services there are different options available. For instance the UseCase activity allow you to execute remote SSH commands, but as this depends on a single SSH connection you could run into the same timeout issue here, and would need to split your server-side scripts into asynchrounous jobs (started in the background with &) -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ taverna-users mailing list [email protected] [email protected] Web site: http://www.taverna.org.uk Mailing lists: http://www.taverna.org.uk/about/contact-us/
