The sbatch options get propagated via environment variables to the
spawned shell and picked up by srun (unless an srun command line
option overrides it). I'd guess your sbatch options conflict with the
srun options, causing the problem. I'd suggest that you take a look at
your environment in the spawned shell for variables starting with
"SLURM_"
Quoting Bob Moench <[email protected]>:
Hi,
Has anyone seen these errors and know what they are?
srun: error: Unable to create job step: Requested node
configuration is not available
srun: error: Unable to create job step: Job/step already
completing or completed
I run this script from an sbatch with the same allocation as the
srun in the script:
for j in `seq 1 250` ; do
delay=`echo $j | awk '{print $1*20}'`
time srun --ntasks=5 --cpus-per-task=1 --ntasks-per-node=1 \
--exclusive test2.exe $delay
done
Run as above, every srun fails with the first message. If I
add a "sleep 1" to the loop, I can do about 140 sruns before the
failure (causing the second message for every failed run). Any
thing with a larger sleep gets pretty much the same results
as the "sleep 2".
The exact number of successful runs varies by 10 or 20. Am I
using up some resource with each run?
For completeness, I am running on a Cray system with SLURM 14.11.8
Thanks,
Bob
--
Bob Moench (rwm); PE Debugger Development; 605-9034; 354-7895; SP 24227
--
Morris "Moe" Jette
CTO, SchedMD LLC
Commercial Slurm Development and Support
===============================================================
Slurm User Group Meeting, 15-16 September 2015, Washington D.C.
http://slurm.schedmd.com/slurm_ug_agenda.html