Hello,

I have an interactive perl script that prompts a user to respond to a
few questions, then based on the responses runs a binary (each with
different arguments) thousands of times.  This is what I've tried so
far.

salloc -n total_cores_across_nodes_1-20 -w "node[1-20]"

Inside the script:
loop over $i
    cd output_dir_$i
    srun -J job$i -o %J.out executable ../input_file other_arguments
end loop

Is this work flow reasonable/recommended?

Instead of supplying -n total_cores_across_nodes_1-20, is there a way to
specify one job step should run on one core?  Each node has a different
number of cores.

squeue -s seems to get stuck showing about the first 15 or so job steps
and then it only seems to update one entry in the list with new jobs.
So, for example I see something like:

         STEPID     NAME PARTITION     USER      TIME NODELIST
       1180.185    js_66       all      jrm      0:06 awarnach[1-15]
         1180.2   js_1-2       all      jrm     41:09 awarnach[1-15]
         1180.3   js_1-3       all      jrm     41:09 awarnach[1-15]
...
        1180.19  js_1-19       all      jrm     41:05 awarnach[1-15]
         1180.1   js_1-1       all      jrm     41:09 awarnach[1-15]

or it simply shows only one listing even when there are many job steps.
Am I misunderstanding something?

Thanks,

Joseph

Attachment: signature.asc
Description: PGP signature

Reply via email to