On Apr 6, 2010, at 1:03 PM, Sam Preston wrote:
> I have a problem with the cluster I'm currently using where nodes
> 'hang' silently from time to time during an MPI call. This causes the
> blocked MPI processes to block indefinitely -- the only way to detect
> an error is to notice that no more o
Hi all,
I have a problem with the cluster I'm currently using where nodes
'hang' silently from time to time during an MPI call. This causes the
blocked MPI processes to block indefinitely -- the only way to detect
an error is to notice that no more output is being written to the log
files. We're