[OMPI users] Fwd: srun works, mpirun does not

2018-06-17 Thread Bennet Fauber
I have a compiled binary that will run with srun but not with mpirun. The attempts to run with mpirun all result in failures to initialize. I have tried this on one node, and on two nodes, with firewall turned on and with it off. Am I missing some command line option for mpirun? OMPI built from t

Re: [OMPI users] Fwd: srun works, mpirun does not

2018-06-17 Thread r...@open-mpi.org
Add --enable-debug to your OMPI configure cmd line, and then add --mca plm_base_verbose 10 to your mpirun cmd line. For some reason, the remote daemon isn’t starting - this will give you some info as to why. > On Jun 17, 2018, at 9:07 AM, Bennet Fauber wrote: > > I have a compiled binary that

Re: [OMPI users] Fwd: srun works, mpirun does not

2018-06-17 Thread Bennet Fauber
I rebuilt with --enable-debug, then ran with [bennet@cavium-hpc ~]$ salloc -N 1 --ntasks-per-node=24 salloc: Pending job allocation 158 salloc: job 158 queued and waiting for resources salloc: job 158 has been allocated resources salloc: Granted job allocation 158 [bennet@cavium-hpc ~]$ srun ./te

Re: [OMPI users] Fwd: srun works, mpirun does not

2018-06-18 Thread Ryan Novosielski
What MPI is SLURM set to use/how was that compiled? Out of the box, the SLURM MPI is set to “none”, or was last I checked, and so isn’t necessarily doing MPI. Now, I did try this with OpenMPI 2.1.1 and it looked right either way (OpenMPI built with “--with-pmi"), but for MVAPICH2 this definitely

Re: [OMPI users] Fwd: srun works, mpirun does not

2018-06-18 Thread Bennet Fauber
Ryan, With srun it's fine. Only with mpirun is there a problem, and that is both on a single node and on multiple nodes. SLURM was built against pmix 2.0.2, and I am pretty sure that SLURM's default is pmix. We are running a recent patch of SLURM, I think. SLURM and OMPI are both being built u

Re: [OMPI users] Fwd: srun works, mpirun does not

2018-06-18 Thread Bennet Fauber
Well, this is kind of interesting. I can strip the configure line back and get mpirun to work on one node, but then neither srun nor mpirun within a SLURM job will run. I can add back configure options to get to ./configure \ --prefix=${PREFIX} \ --mandir=${PREFIX}/share/man \ --with