Hi Sylvain
I've spent several hours trying to replicate the behavior you described on
clusters up to a couple of hundred nodes (all running slurm), without success.
I'm becoming increasingly convinced that this is a configuration issue as
opposed to a code issue.
I have enclosed the platform f
On Nov 19, 2009, at 7:52 AM, Sylvain Jeaugey wrote:
> Thank you Ralph for this precious help.
>
> I setup a quick-and-dirty patch basically postponing process_msg (hence
> daemon_collective) until the launch is done. In process_msg, I therefore
> requeue a process_msg handler and return.
That
Thank you Ralph for this precious help.
I setup a quick-and-dirty patch basically postponing process_msg (hence
daemon_collective) until the launch is done. In process_msg, I therefore
requeue a process_msg handler and return.
In this "all-must-be-non-blocking-and-done-through-opal_progress"
Very strange. As I said, we routinely launch jobs spanning several hundred
nodes without problem. You can see the platform files for that setup in
contrib/platform/lanl/tlcc
That said, it is always possible you are hitting some kind of race condition we
don't hit. In looking at the code, one po
So is there any reason OMPI should not auto-detach buffers at Finalize?
I understand technically we don't have to but there are false
performance degradations incurred by us not detaching thus making OMPI
look significantly slower compared to other MPIs for no real reason. So
unless there is
I would say I use the default settings, i.e. I don't set anything
"special" at configure.
I'm launching my processes with SLURM (salloc + mpirun).
Sylvain
On Wed, 18 Nov 2009, Ralph Castain wrote:
How did you configure OMPI?
What launch mechanism are you using - ssh?
On Nov 17, 2009, at 9: