[slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-05 Thread Andrés Marín Díaz
Hello, since we have updated to the new slurm version (19.05) every time a jobstep is launched with mpirun it ends with the following error message:     An ORTE daemon has unexpectedly failed after launch and before     communicating back to mpirun. This could be caused by a number     of factor

Re: [slurm-users] How to preempt job with priority_multifactor parameter ?

2019-06-05 Thread Chris Samuel
On Tuesday, 4 June 2019 6:57:39 AM PDT Jean-mathieu CHANTREIN wrote: > Is there a way to preempt jobs by using the priority of a job calculate with > priority_multifactor and not with a priority related to partition or qos ? You would need to write your own preemption plugin to do that, the exist

Re: [slurm-users] Submit job using srun fails but sbatch works

2019-06-05 Thread Chris Samuel
On Monday, 3 June 2019 7:53:39 AM PDT Alexander Åhman wrote: > That was my first thought too, but... no. Both /etc/hosts (not used) and > slurm.conf are identical on all nodes, both working and non-working nodes. I think Slurm caches things like that, so it might be worth restarting slurmctld to

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-05 Thread Chris Samuel
On Wednesday, 5 June 2019 10:04:11 AM PDT Andrés Marín Díaz wrote: > Can it be a bug in the new version? If it's working with srun but not with mpirun it sounds like there's some incompatibility between how mpirun is calling srun to launch orted and what Slurm is doing now. You'd need to find