Re: [OMPI devel] v1.7.4, mpiexec "exit 1" and no other warning - behaviour changed to previous versions

2014-02-13 Thread Ralph Castain
Okay, this exposed the problem. The issue is that "ib0" on the two machines is defined on two completely different IP subnets: linuxbmc0008: 134.61.202.7 linuxscc004: 192.168.222.4 The OOB doesn't think those two are directly reachable by each other as the IP/subnet-mask don't match - we

Re: [OMPI devel] v1.7.4, mpiexec "exit 1" and no other warning - behaviour changed to previous versions

2014-02-13 Thread Paul Kapinos
Attached the output from openmpi/1.7.5a1r30708 $ $MPI_BINDIR/mpiexec -mca oob_tcp_if_include ib0 -mca oob_base_verbose 100 -H linuxscc004 -np 1 hostname 2>&1 | tee oob_base_verbose-linuxbmc0008-175a1r29587.txt Well, some 5 lines added. (The ib0 on linuxscc004 is not reachable from linuxbmc00

Re: [OMPI devel] v1.7.4, mpiexec "exit 1" and no other warning - behaviour changed to previous versions

2014-02-12 Thread Ralph Castain
Could you please give the nightly 1.7.5 tarball a try using the same cmd line options and send me the output? I see the problem, but am trying to understand how it happens. I've added a bunch of diagnostic statements that should help me track it down. Thanks Ralph On Feb 12, 2014, at 1:26 AM,

Re: [OMPI devel] v1.7.4, mpiexec "exit 1" and no other warning - behaviour changed to previous versions

2014-02-12 Thread Paul Kapinos
As said, the change in behaviour is new in 1.7.4 - all previous versions has been worked. Moreover, setting "-mca oob_tcp_if_include ib0" is a workaround for older versions of Open MPI for some 60-seconds timeout when starting the same command (which is still sucessfull); or for infinite waiting

Re: [OMPI devel] v1.7.4, mpiexec "exit 1" and no other warning - behaviour changed to previous versions

2014-02-11 Thread Ralph Castain
I've added better error messages in the trunk, scheduled to move over to 1.7.5. I don't see anything in the code that would explain why we don't pickup and use ib0 if it is present and specified in if_include - we should be doing it. For now, can you run this with "-mca oob_base_verbose 100" on

[OMPI devel] v1.7.4, mpiexec "exit 1" and no other warning - behaviour changed to previous versions

2014-02-11 Thread Paul Kapinos
Dear Open MPI developer, I. we see peculiar behaviour in the new 1.7.4 version of Open MPI which is a change to previous versions: - when calling "mpiexec", it returns "1" and exits silently. The behaviour is reproducible; well not that easy reproducible. We have multiple InfiniBand islands i