Re: [OMPI devel] Deadlocks with new (routed) orted launch algorithm

2009-11-19 Thread Ralph Castain
Hi Sylvain I've spent several hours trying to replicate the behavior you described on clusters up to a couple of hundred nodes (all running slurm), without success. I'm becoming increasingly convinced that this is a configuration issue as opposed to a code issue. I have enclosed the platform f

Re: [OMPI devel] Deadlocks with new (routed) orted launch algorithm

2009-11-19 Thread Ralph Castain
On Nov 19, 2009, at 7:52 AM, Sylvain Jeaugey wrote: > Thank you Ralph for this precious help. > > I setup a quick-and-dirty patch basically postponing process_msg (hence > daemon_collective) until the launch is done. In process_msg, I therefore > requeue a process_msg handler and return. That

Re: [OMPI devel] Deadlocks with new (routed) orted launch algorithm

2009-11-19 Thread Sylvain Jeaugey
Thank you Ralph for this precious help. I setup a quick-and-dirty patch basically postponing process_msg (hence daemon_collective) until the launch is done. In process_msg, I therefore requeue a process_msg handler and return. In this "all-must-be-non-blocking-and-done-through-opal_progress"

Re: [OMPI devel] Deadlocks with new (routed) orted launch algorithm

2009-11-19 Thread Ralph Castain
Very strange. As I said, we routinely launch jobs spanning several hundred nodes without problem. You can see the platform files for that setup in contrib/platform/lanl/tlcc That said, it is always possible you are hitting some kind of race condition we don't hit. In looking at the code, one po

Re: [OMPI devel] Finalize without Detach???

2009-11-19 Thread Terry Dontje
So is there any reason OMPI should not auto-detach buffers at Finalize? I understand technically we don't have to but there are false performance degradations incurred by us not detaching thus making OMPI look significantly slower compared to other MPIs for no real reason. So unless there is

Re: [OMPI devel] Deadlocks with new (routed) orted launch algorithm

2009-11-19 Thread Sylvain Jeaugey
I would say I use the default settings, i.e. I don't set anything "special" at configure. I'm launching my processes with SLURM (salloc + mpirun). Sylvain On Wed, 18 Nov 2009, Ralph Castain wrote: How did you configure OMPI? What launch mechanism are you using - ssh? On Nov 17, 2009, at 9: