I'm sorry, Sylvain - I simply cannot replicate this problem (tried yet another
slurm system):
./configure --prefix=blah --with-platform=contrib/platform/iu/odin/debug
[rhc@odin ~]$ salloc -N 16 tcsh
salloc: Granted job allocation 75294
[rhc@odin mpi]$ mpirun -pernode ./hello
Hello, World, I am 1
The attach patch should resolve the long pending issue that we have on
our track https://svn.open-mpi.org/trac/ompi/ticket/1912.
The issue: As process of OpenIB BTL creation we also create set of SRQs
and corresponding receive fragments are allocated and posted on all
SRQs. It mean that a pro
Oops. This was a mistake in how I initially setup the v1.4 nightly builds
yesterday (i.e., a local config error on eddie, the machine that makes the
nightly builds). I'll fix now...
On Dec 1, 2009, at 9:00 PM, MPI Team wrote:
>
> ERROR: Command returned a non-zero exist status (v1.4):
>
Reminder for people to fill out the doodle if you want to be on the call to
discuss mpi-request issues next week. Please fill it out by tomorrow (thurs)
cob - I'll pick a time and sretup a call on fri morning.
-jms
Sent from my PDA. No type good.
Ok, so I tried with RHEL5 and I get the same (even at 6 nodes) : when
setting ORTE_RELAY_DELAY to 1, I get the deadlock systematically with the
typical stack.
Without my "reproducer patch", 80 nodes was the lower bound to reproduce
the bug (and you needed a couple of runs to get it). But since