When the link fails, mpirun loses contact with the orted on that node. This
causes the OOB to callback to the routed framework to see if this is a critical
link. Since a link to a daemon -is- considered critical, a call is made to the
errmgr framework indicating that a proc (in this case, a daem
Hi,
I want to use OpenMPI in a context where
the link failure has high probability.
My intention is both...I also want to get an
indepth understanding of the code
to know what happens behind the scenes.
Anybody have suggestions or methodologies to flollow
As Josh indicated, the current OMPI trunk won't do that at the moment. Josh and
I are working on a side branch to integrate the OpenRCM methods with mpirun to
provide an OMPI capability for those not running ORCM on their systems.
What wasn't clear is your motivation. Are you trying to develop t
Hi Josh,
First of all...thanks for your response..
There was some typos in my mail
making it vague at some portions.
Let me make the scenarios mentioned in the
previous mail more elaborative.
What I tried is as follows.
On Jan 6, 2010, at 9:04 AM, Sai Sudheesh wrote:
Hi,
Just about two months ago I started experimenting with OpenMPI.
I found this piece of software very interesting.
How can I make this software fault tolerant?
Depends on what you mean my fault tolerant. :)
As of no
Hi,
Just about two months ago I started experimenting with OpenMPI.
I found this piece of software very interesting.
How can I make this software fault tolerant?
As of now I am running this software on two machines
having quad core processors and fedora 10.