But unfortunately the parallel job started by srun is still killed at the first notice when the higher priority job enters the queue. That is, the parallel job is killed without grace time (for other, serial applications, it seems to work as I expect).

Regards,

/jon

On 10/07/2017 10:01 AM, Jon Tegner wrote:

Thanks!

Indeed seems to work when I used "--with_pmi" when building openmpi, and added the flag --mpi=pmi2 to the srun command.

Much appreciated!

/jon

On 10/06/2017 09:18 PM, r...@open-mpi.org wrote:
Not stupid at all. I suspect the problem is that OMPI was not configured --with-pmi=<path-to-slurm-pmi-modules>. As a result, when you srun the application, each processes thinks it is a singleton and nothing works correctly.

OMPI does not pickup the slurm pmi support by default due to license issues, so you have to manually specify it.

Reply via email to