Hi,

I'm not sure whether this problem is with SLURM or OpenMPI, but the stack 
traces (below) point to an issue within OpenMPI.

Whenever I try to launch an MPI job within SLURM, mpirun immediately 
segmentation faults -- but only if the machine that SLURM allocated to MPI is 
different to the one that I launched the MPI job.

However, if I force SLURM to allocate only the local node (ie, the one on which 
salloc was called), everything works fine.

Failing case:
michael@ipc ~ $ salloc -n8 mpirun --display-map ./mpi
 ========================   JOB MAP   ========================

 Data for node: Name: ipc4      Num procs: 8
        Process OMPI jobid: [21326,1] Process rank: 0
        Process OMPI jobid: [21326,1] Process rank: 1
        Process OMPI jobid: [21326,1] Process rank: 2
        Process OMPI jobid: [21326,1] Process rank: 3
        Process OMPI jobid: [21326,1] Process rank: 4
        Process OMPI jobid: [21326,1] Process rank: 5
        Process OMPI jobid: [21326,1] Process rank: 6
        Process OMPI jobid: [21326,1] Process rank: 7

 =============================================================
[ipc:16986] *** Process received signal ***
[ipc:16986] Signal: Segmentation fault (11)
[ipc:16986] Signal code: Address not mapped (1)
[ipc:16986] Failing at address: 0x801328268
[ipc:16986] [ 0] /lib/libpthread.so.0(+0xf8f0) [0x7ff85c7638f0]
[ipc:16986] [ 1] /usr/lib/libopen-rte.so.0(+0x3459a) [0x7ff85d4a059a]
[ipc:16986] [ 2] /usr/lib/libopen-pal.so.0(+0x1eeb8) [0x7ff85d233eb8]
[ipc:16986] [ 3] /usr/lib/libopen-pal.so.0(opal_progress+0x99) [0x7ff85d228439]
[ipc:16986] [ 4] /usr/lib/libopen-rte.so.0(orte_plm_base_daemon_callback+0x9d) 
[0x7ff85d4a002d]
[ipc:16986] [ 5] /usr/lib/openmpi/lib/openmpi/mca_plm_slurm.so(+0x211a) 
[0x7ff85bbc311a]
[ipc:16986] [ 6] mpirun() [0x403c1f]
[ipc:16986] [ 7] mpirun() [0x403014]
[ipc:16986] [ 8] /lib/libc.so.6(__libc_start_main+0xfd) [0x7ff85c3efc4d]
[ipc:16986] [ 9] mpirun() [0x402f39]
[ipc:16986] *** End of error message ***

Non-failing case:
michael@eng-ipc4 ~ $ salloc -n8 -w ipc4 mpirun --display-map ./mpi
 ========================   JOB MAP   ========================

 Data for node: Name: eng-ipc4.FQDN Num procs: 8
        Process OMPI jobid: [12467,1] Process rank: 0
        Process OMPI jobid: [12467,1] Process rank: 1
        Process OMPI jobid: [12467,1] Process rank: 2
        Process OMPI jobid: [12467,1] Process rank: 3
        Process OMPI jobid: [12467,1] Process rank: 4
        Process OMPI jobid: [12467,1] Process rank: 5
        Process OMPI jobid: [12467,1] Process rank: 6
        Process OMPI jobid: [12467,1] Process rank: 7

 =============================================================
Process 1 on eng-ipc4.FQDN out of 8
Process 3 on eng-ipc4.FQDN out of 8
Process 4 on eng-ipc4.FQDN out of 8
Process 6 on eng-ipc4.FQDN out of 8
Process 7 on eng-ipc4.FQDN out of 8
Process 0 on eng-ipc4.FQDN out of 8
Process 2 on eng-ipc4.FQDN out of 8
Process 5 on eng-ipc4.FQDN out of 8

Using mpi directly is fine:
eg mpirun -H 'ipc3,ipc4'  -np 8 ./mpi
Works as expected

This is a (small) homogenous cluster, all Xeon class machines with plenty of 
RAM and shared filesystem over NFS, running 64-bit Ubuntu server.  I was 
running stock OpenMPI (1.4.1) and SLURM (2.1.1), I have since upgraded to 
latest stable OpenMPI (1.4.3) and SLURM (2.2.0), with no effect. (the newer 
binaries were compiled from the respective upstream Debian packages).

strace (not shown) shows that the job is launched via srun and a connection is 
received back from the child process over TCP/IP. Soon after this, mpirun 
crashes. Nodes communicate over a semi-dedicated TCP/IP GigE connection.

Is this a known bug? What is going wrong?

Regards,
Michael Curtis



Reply via email to