[OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread Siegmar Gross
Hi, I have installed openmpi-v3.x-201705250239-d5200ea on my "SUSE Linux Enterprise Server 12.2 (x86_64)" with Sun C 5.14 and gcc-7.1.0. Depending on the machine that I use to start my processes, I have a problem with "--host" for versions "v3.x" and "master", while everything works as expected w

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread gilles
Hi Siegmar, what if you ? mpiexec --host loki:1,exin:1 -np 3 hello_1_mpi are loki and exin different ? (os, sockets, core) Cheers, Gilles - Original Message - > Hi, > > I have installed openmpi-v3.x-201705250239-d5200ea on my "SUSE Linux > Enterprise Server 12.2 (x86_64)" with Sun C

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread Siegmar Gross
Hi Gilles, what if you ? mpiexec --host loki:1,exin:1 -np 3 hello_1_mpi I need as many slots as processes so that I use "-np 2". "mpiexec --host loki,exin -np 2 hello_1_mpi" works as well. The command breaks, if I use at least "-np 3" and distribute the processes across at least two machines.

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread gilles
Hi Siegmar, my bad, there was a typo in my reply. i really meant > > what if you ? > > mpiexec --host loki:2,exin:1 -np 3 hello_1_mpi but you also tried that and it did not help. i could not find anything in your logs that suggest mpiexec tries to start 5 MPI tasks, did i miss something ? i w

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread r...@open-mpi.org
This behavior is as-expected. When you specify "-host foo,bar”, you have told us to assign one slot to each of those nodes. Thus, running 3 procs exceeds the number of slots you assigned. You can tell it to set the #slots to the #cores it discovers on the node by using “-host foo:*,bar:*” I ca

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread Gilles Gouaillardet
Ralph, the issue Siegmar initially reported was loki hello_1 111 mpiexec -np 3 --host loki:2,exin hello_1_mpi per what you wrote, this should be equivalent to loki hello_1 111 mpiexec -np 3 --host loki:2,exin:1 hello_1_mpi and this is what i initially wanted to double check (but i made a ty

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread r...@open-mpi.org
Until the fixes pending in the big ORTE update PR are committed, I suggest not wasting time chasing this down. I tested the “patched” version of the 3.x branch, and it works just fine. > On May 30, 2017, at 7:43 PM, Gilles Gouaillardet wrote: > > Ralph, > > > the issue Siegmar initially rep

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread Siegmar Gross
Hi Gilles, I configured Open MPI with the following command. ../openmpi-v3.x-201705250239-d5200ea/configure \ --prefix=/usr/local/openmpi-3.0.0_64_cc \ --libdir=/usr/local/openmpi-3.0.0_64_cc/lib64 \ --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ --with-jdk-headers=/usr/local/jdk1.8.0_66

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread Gilles Gouaillardet
Siegmar, the "big ORTE update" is a bunch of backports from master to v3.x btw, does the same error occurs with master ? i noted mpirun simply does ssh exin orted ... can you double check the right orted (e.g. /usr/local/openmpi-3.0.0_64_cc/bin/orted) or you can try to mpirun --mca orte

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-31 Thread Siegmar Gross
Hi Gilles, Am 31.05.2017 um 08:38 schrieb Gilles Gouaillardet: Siegmar, the "big ORTE update" is a bunch of backports from master to v3.x btw, does the same error occurs with master ? Yes, it does, but the error occurs only if I use a real machine with my virtual machine "exin". I get the e

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-31 Thread Gilles Gouaillardet
Thanks Siegmar, i was finally able to reproduce it. the error is triggered by the VM topology, and i was able to reproduce it by manually removing the "NUMA" objects from the topology. as a workaround, you can mpirun --map-by socket ... i will follow-up on the devel ML with Ralph. Bes