Re: [OMPI users] detect hung node

2010-04-06 Thread Jeff Squyres
On Apr 6, 2010, at 1:03 PM, Sam Preston wrote: > I have a problem with the cluster I'm currently using where nodes > 'hang' silently from time to time during an MPI call. This causes the > blocked MPI processes to block indefinitely -- the only way to detect > an error is to notice that no more o

[OMPI users] detect hung node

2010-04-06 Thread Sam Preston
Hi all, I have a problem with the cluster I'm currently using where nodes 'hang' silently from time to time during an MPI call. This causes the blocked MPI processes to block indefinitely -- the only way to detect an error is to notice that no more output is being written to the log files. We're