Hi,

Has anyone seen these errors and know what they are?

  srun: error: Unable to create job step: Requested node configuration is not 
available
  srun: error: Unable to create job step: Job/step already completing or 
completed

I run this script from an sbatch with the same allocation as the srun in the script:

  for j in `seq 1 250` ; do
    delay=`echo $j | awk '{print $1*20}'`
    time srun --ntasks=5 --cpus-per-task=1 --ntasks-per-node=1 \
              --exclusive test2.exe $delay
  done

Run as above, every srun fails with the first message. If I
add a "sleep 1" to the loop, I can do about 140 sruns before the
failure (causing the second message for every failed run). Any
thing with a larger sleep gets pretty much the same results
as the "sleep 2".

The exact number of successful runs varies by 10 or 20. Am I
using up some resource with each run?

For completeness, I am running on a Cray system with SLURM 14.11.8

Thanks,
Bob

--
Bob Moench (rwm); PE Debugger Development; 605-9034; 354-7895; SP 24227

Reply via email to