But unfortunately the parallel job started by srun is still killed at
the first notice when the higher priority job enters the queue. That is,
the parallel job is killed without grace time (for other, serial
applications, it seems to work as I expect).
Regards,
/jon
On 10/07/2017 10:01 AM, Jon Tegner wrote:
Thanks!
Indeed seems to work when I used "--with_pmi" when building openmpi,
and added the flag --mpi=pmi2 to the srun command.
Much appreciated!
/jon
On 10/06/2017 09:18 PM, r...@open-mpi.org wrote:
Not stupid at all. I suspect the problem is that OMPI was not
configured --with-pmi=<path-to-slurm-pmi-modules>. As a result, when
you srun the application, each processes thinks it is a singleton and
nothing works correctly.
OMPI does not pickup the slurm pmi support by default due to license
issues, so you have to manually specify it.