Paul,

How are you invoking srun with the application in question?

It seems strange that the messages would be manifest when the job runs
on more than one node.  Have you tried passing the flags "-N" and
"--ntasks-per-node" for testing?  What about using "-w hostfile"?
Those would be the options that I'd immediately try to begin
trouble-shooting the issue.

John DeSantis

2015-06-02 14:19 GMT-04:00 Paul van der Mark <pvanderm...@fsu.edu>:
>
> All,
>
> We are preparing for a switch from our current job scheduler to slurm
> and I am running into a strange issue. I compiled openmpi with slurm
> support and when I start a job with sbatch and use mpirun everything
> works fine. However, when I use srun instead of mpirun and the job does
> not fit on a single node, I either receive the following openmpi warning
> a number of times:
> --------------------------------------------------------------------------
> WARNING: Missing locality information required for sm initialization.
> Continuing without shared memory support.
> --------------------------------------------------------------------------
> or a segmentation fault in an openmpi library (address not mapped) or
> both.
>
> I only observe this with mpi-programs compiled with openmpi and ran by
> srun when the job does not fit on a single node. The same program
> started by openmpi's mpirun runs fine. The same source compiled with
> mvapich2 works fine with srun.
>
> Some version info:
> slurm 14.11.7
> openmpi 1.8.5
> hwloc 1.10.1 (used for both slurm and openmpi)
> os: RHEL 7.1
>
> Has anyone seen that warning before and what would be a good place to
> start troubleshooting?
>
>
> Thank you,
> Paul

Reply via email to