Thanks a lot Ralph it was exactly that!
2013/9/13 Ralph Castain <r...@open-mpi.org> > Configure OMPI --with-pmi as the port reservation method won't work in > this scenario. > > On Sep 13, 2013, at 8:29 AM, Yann Sagon <ysa...@gmail.com> wrote: > > I have that set in slurm.conf: > > MpiDefault=openmpi > MpiParams=ports=12000-12999 > > The cluster is 56 nodes of 16 cores. Do I need to increase something? > > If I issue this on my nodes, nothings appears: > > cexec "netstat -laputen | grep ':12[0-9]\{3\}'" > > > > > 2013/9/13 Moe Jette <je...@schedmd.com> > >> >> I suspect the problem is related to reserved ports described here: >> http://slurm.schedmd.com/mpi_**guide.html#open_mpi<http://slurm.schedmd.com/mpi_guide.html#open_mpi> >> >> >> Quoting Yann Sagon <ysa...@gmail.com>: >> >> (sorry for the previous post, bad manipulation) >>> >>> Hello, >>> >>> I'm facing the following problem: one of our user wrote a simple c >>> wrapper >>> that launches a multithreaded program. It was working before an update of >>> the cluster (os, and ofed). >>> >>> the wrapper is invoked like that: >>> >>> $srun -n64 -c4 wrapper >>> >>> The result is something like that: >>> >>> [...] >>> srun: error: node04: task 12: Killed >>> srun: error: node04: tasks 13-15 unable to claim reserved port, retrying. >>> srun: Terminating job step 47498.0 >>> slurmd[node04]: *** STEP 47498.0 KILLED AT 2013-09-13T17:13:33 WITH >>> SIGNAL >>> 9 *** >>> [...] >>> >>> If we call the wrapper like that: >>> >>> $srun -n64 wrapper >>> >>> it is working but we have only one core per thread. >>> >>> We were using slurm 2.5.4, now I tried with 2.6.2 >>> Tested with openmpi 1.6.4 and 1.6.5 >>> >>> >>> here is the code of the wrapper: >>> >>> #include <stdio.h> >>> #include <stdlib.h> >>> #include <mpi.h> >>> >>> int main(int argc, char *argv[]) >>> { >>> int rank, size; >>> char buf[512]; >>> >>> MPI_Init(&argc, &argv); >>> MPI_Comm_rank(MPI_COMM_WORLD, &rank); >>> MPI_Comm_size(MPI_COMM_WORLD, &size); >>> sprintf(buf, "the_multithreaded_binary %d %d", rank, size); >>> system(buf); >>> MPI_Finalize(); >>> >>> return 0; >>> } >>> >>> >> >> > >