Hi,
Has anyone seen these errors and know what they are?
srun: error: Unable to create job step: Requested node configuration is not
available
srun: error: Unable to create job step: Job/step already completing or
completed
I run this script from an sbatch with the same allocation as the srun in
the script:
for j in `seq 1 250` ; do
delay=`echo $j | awk '{print $1*20}'`
time srun --ntasks=5 --cpus-per-task=1 --ntasks-per-node=1 \
--exclusive test2.exe $delay
done
Run as above, every srun fails with the first message. If I
add a "sleep 1" to the loop, I can do about 140 sruns before the
failure (causing the second message for every failed run). Any
thing with a larger sleep gets pretty much the same results
as the "sleep 2".
The exact number of successful runs varies by 10 or 20. Am I
using up some resource with each run?
For completeness, I am running on a Cray system with SLURM 14.11.8
Thanks,
Bob
--
Bob Moench (rwm); PE Debugger Development; 605-9034; 354-7895; SP 24227