I mean killed the orted deamon process during the mpi job running , but the mpi job hang and could't notice one of it's rank failed.
> Date: Wed, 1 Apr 2009 19:09:34 +0800 > From: [email protected] > To: [email protected] > Subject: Re: [OMPI users] Beginner's question: how to avoid a running mpi job > hang if host or network failed or orted deamon killed? > > Is there a firewall somewhere ? > > Jerome > > Guanyinzhu wrote: > > Hi! > > I'm using OpenMPI 1.3 on ten nodes connected with Gigabit Ethernet on > > Redhat Linux x86_64. > > > > I run a test like this: just killed the orted process and the job hung > > for a long time (hang for 2~3 hours then I killed the job). > > > > I have the follow questions: > > > > when network failed or host failed or orted deamon was killed by > > accident, How long would the running mpi job notice and exit? > > > > Does OpenMPI support a heartbeat mechanism or how could I fast > > detect the failture to avoid the mpi job hang? > > > > > > thanks a lot! > > > > > > ------------------------------------------------------------------------ > > ?MSN????,??????????! ????! <http://mobile.msn.com.cn/> > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > users mailing list > > [email protected] > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > [email protected] > http://www.open-mpi.org/mailman/listinfo.cgi/users _________________________________________________________________ Live Search视频搜索,快速检索视频的利器! http://www.live.com/?scope=video
