Re: [OMPI users] Fwd: srun works, mpirun does not

2018-06-18 Thread Bennet Fauber
Well, this is kind of interesting. I can strip the configure line back and get mpirun to work on one node, but then neither srun nor mpirun within a SLURM job will run. I can add back configure options to get to ./configure \ --prefix=${PREFIX} \ --mandir=${PREFIX}/share/man \

Re: [OMPI users] Fwd: srun works, mpirun does not

2018-06-18 Thread Bennet Fauber
Ryan, With srun it's fine. Only with mpirun is there a problem, and that is both on a single node and on multiple nodes. SLURM was built against pmix 2.0.2, and I am pretty sure that SLURM's default is pmix. We are running a recent patch of SLURM, I think. SLURM and OMPI are both being built

Re: [OMPI users] Enforcing specific interface and subnet usage

2018-06-18 Thread r...@open-mpi.org
I’m not entirely sure I understand what you are trying to do. The PMIX_SERVER_URI2 envar tells local clients how to connect to their local PMIx server (i.e., the OMPI daemon on that node). This is always done over the loopback device since it is a purely local connection that is never used for

[OMPI users] Enforcing specific interface and subnet usage

2018-06-18 Thread Maksym Planeta
Hello, I want to force OpenMPI to use TCP and in particular use a particular subnet. Unfortunately, I can't manage to do that. Here is what I try: $BIN/mpirun --mca pml ob1 --mca btl tcp,self --mca ptl_tcp_remote_connections 1 --mca btl_tcp_if_include '10.233.0.0/19' -np 4 --oversubscribe -H

Re: [OMPI users] Fwd: srun works, mpirun does not

2018-06-18 Thread Ryan Novosielski
What MPI is SLURM set to use/how was that compiled? Out of the box, the SLURM MPI is set to “none”, or was last I checked, and so isn’t necessarily doing MPI. Now, I did try this with OpenMPI 2.1.1 and it looked right either way (OpenMPI built with “--with-pmi"), but for MVAPICH2 this

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread r...@open-mpi.org
This is on an ARM processor? I suspect that is the root of the problems as we aren’t seeing anything like this elsewhere. > On Jun 18, 2018, at 1:27 PM, Bennet Fauber wrote: > > If it's of any use, 3.0.0 seems to hang at > > Making check in class > make[2]: Entering directory

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread Bennet Fauber
If it's of any use, 3.0.0 seems to hang at Making check in class make[2]: Entering directory `/tmp/build/openmpi-3.0.0/test/class' make ompi_rb_tree opal_bitmap opal_hash_table opal_proc_table opal_tree opal_list opal_value_array opal_pointer_array opal_lifo opal_fifo make[3]: Entering directory

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread Bennet Fauber
No such luck. If it matters, mpirun does seem to work with processes on the local node that have no internal MPI code. That is, [bennet@cavium-hpc ~]$ mpirun -np 4 hello Hello, ARM Hello, ARM Hello, ARM Hello, ARM but it fails with a similar error if run while a SLURM job is active; i.e.,

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread r...@open-mpi.org
I doubt Slurm is the issue. For grins, lets try adding “--mca plm rsh” to your mpirun cmd line and see if that works. > On Jun 18, 2018, at 12:57 PM, Bennet Fauber wrote: > > To eliminate possibilities, I removed all other versions of OpenMPI > from the system, and rebuilt using the same

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread Bennet Fauber
To eliminate possibilities, I removed all other versions of OpenMPI from the system, and rebuilt using the same build script as was used to generate the prior report. [bennet@cavium-hpc bennet]$ ./ompi-3.1.0bd.sh Checking compilers and things OMPI is ompi COMP_NAME is gcc_7_1_0 SRC_ROOT is

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread r...@open-mpi.org
Hmmm...well, the error has changed from your initial report. Turning off the firewall was the solution to that problem. This problem is different - it isn’t the orted that failed in the log you sent, but the application proc that couldn’t initialize. It looks like that app was compiled against