Re: [OMPI devel] How can I achieve node fail over

2010-01-12 Thread Ralph Castain
When the link fails, mpirun loses contact with the orted on that node. This causes the OOB to callback to the routed framework to see if this is a critical link. Since a link to a daemon -is- considered critical, a call is made to the errmgr framework indicating that a proc (in this case, a daem

[OMPI devel] 1.5 - re-branch and MTT

2010-01-12 Thread Jeff Squyres
We mentioned today on the call a potentially aggressive schedule to get v1.5 out the door: - re-branch from SVN trunk this Friday, 13 Jan, 2010 - target release for Tuesday, 16 Feb, 2010 Yes, this means releasing in about 5 weeks. It's an aggressive schedule, but given that the trunk is pretty

Re: [OMPI devel] RFC: silently allow component open() to fail

2010-01-12 Thread Jeff Squyres
I forgot to include the patch itself -- here's a mercurial commit showing the change: http://bitbucket.org/jsquyres/ummunot/changeset/d0dd138df4e5/ If no one objects (and I don't think that anyone will), I'll commit later today. On Jan 7, 2010, at 3:03 PM, Jeff Squyres wrote: > WHAT: Mak

Re: [OMPI devel] How can I achieve node fail over

2010-01-12 Thread Sai Sudheesh
Hi, I want to use OpenMPI in a context where the link failure has high probability. My intention is both...I also want to get an indepth understanding of the code to know what happens behind the scenes. Anybody have suggestions or methodologies to flollow