On Apr 6, 2010, at 1:03 PM, Sam Preston wrote: > I have a problem with the cluster I'm currently using where nodes > 'hang' silently from time to time during an MPI call. This causes the > blocked MPI processes to block indefinitely -- the only way to detect > an error is to notice that no more output is being written to the log > files. We're trying to resolve the underlying cause of the nodes > hanging, but in the mean time, is there a way to set a timeout or > something similar to detect this situation? Sorry if this has been > addressed before, I searched the FAQ and archives and didn't come up > with anything.
Unfortunately, no. MPI doesn't actively check to see if an application has deadlocked (although there are tools for doing this kind of thing -- google around for them). Or if something has gone wrong, Open MPI may not be detecting it properly. Hopefully, it's not an Open MPI bug! I wish I had more helpful information for you -- let us know what you find about the underlying cause. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/