The sbatch options get propagated via environment variables to the spawned shell and picked up by srun (unless an srun command line option overrides it). I'd guess your sbatch options conflict with the srun options, causing the problem. I'd suggest that you take a look at your environment in the spawned shell for variables starting with "SLURM_"

Quoting Bob Moench <[email protected]>:
Hi,

Has anyone seen these errors and know what they are?

srun: error: Unable to create job step: Requested node configuration is not available srun: error: Unable to create job step: Job/step already completing or completed

I run this script from an sbatch with the same allocation as the srun in the script:

  for j in `seq 1 250` ; do
    delay=`echo $j | awk '{print $1*20}'`
    time srun --ntasks=5 --cpus-per-task=1 --ntasks-per-node=1 \
              --exclusive test2.exe $delay
  done

Run as above, every srun fails with the first message. If I
add a "sleep 1" to the loop, I can do about 140 sruns before the
failure (causing the second message for every failed run). Any
thing with a larger sleep gets pretty much the same results
as the "sleep 2".

The exact number of successful runs varies by 10 or 20. Am I
using up some resource with each run?

For completeness, I am running on a Cray system with SLURM 14.11.8

Thanks,
Bob

--
Bob Moench (rwm); PE Debugger Development; 605-9034; 354-7895; SP 24227


--
Morris "Moe" Jette
CTO, SchedMD LLC
Commercial Slurm Development and Support
===============================================================
Slurm User Group Meeting, 15-16 September 2015, Washington D.C.
http://slurm.schedmd.com/slurm_ug_agenda.html

Reply via email to