Re: [slurm-users] Job array start time and SchedNodes

Thekla Loizou Thu, 09 Dec 2021 03:47:01 -0800

Dear Loris,

Yes it is indeed a bit odd. At least now I know that this is how SLURMbehaves and not something that has to do with our configuration.


Regards,

Thekla

On 9/12/21 1:04 μ.μ., Loris Bennett wrote:

Dear Thekla,

Yes, I think you are right.  I have found a similar job on my system and
this does seem to be the normal, slightly confusing behaviour.  It looks
as if the pending elements of the array get assigned a single node,
but then start on other nodes:

   $ squeue -j 8536946 -O jobid,jobarrayid,reason,schednodes,nodelist,state | 
head
   JOBID               JOBID               REASON              SCHEDNODES       
   NODELIST            STATE
   8536946             8536946_[401-899]   Resources           g002             
                       PENDING
   8658719             8536946_400         None                (null)           
   g006                RUNNING
   8658685             8536946_399         None                (null)           
   g012                RUNNING
   8658625             8536946_398         None                (null)           
   g001                RUNNING
   8658491             8536946_397         None                (null)           
   g006                RUNNING
   8658428             8536946_396         None                (null)           
   g003                RUNNING
   8658427             8536946_395         None                (null)           
   g003                RUNNING
   8658426             8536946_394         None                (null)           
   g007                RUNNING
   8658425             8536946_393         None                (null)           
   g002                RUNNING

This strikes me as a bit odd.

Cheers,

Loris

Thekla Loizou <t.loi...@cyi.ac.cy> writes:

Dear Loris,

Thank you for your reply. I don't believe that there is something wrong with the
job configuration or the node configuration to be honest.

I have just submitted a simple sleep script:

#!/bin/bash

sleep 10

as below:

sbatch --array=1-10 --ntasks-per-node=40 --time=09:00:00 test.sh

and squeue shows:

           131799_1       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
           131799_2       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
           131799_3       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
           131799_4       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
           131799_5       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
           131799_6       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
           131799_7       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
           131799_8       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
           131799_9       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
          131799_10       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)

All of the jobs seem to be scheduled on node cn04.

When they start running they run on separate nodes:

           131799_1       cpu  test.sh   thekla  R       0:02 1 cn01
           131799_2       cpu  test.sh   thekla  R       0:02 1 cn02
           131799_3       cpu  test.sh   thekla  R       0:02 1 cn03
           131799_4       cpu  test.sh   thekla  R       0:02 1 cn04

Regards,

Thekla

On 7/12/21 5:17 μ.μ., Loris Bennett wrote:

Dear Thekla,

Thekla Loizou <t.loi...@cyi.ac.cy> writes:

Dear Loris,

There is no specific node required for this array. I can verify that from
"scontrol show job 124841" since the requested node list is empty:
ReqNodeList=(null)

Also, all 17 nodes of the cluster are identical so all nodes fulfill the job
requirements, not only node cn06.

By "saving" the other nodes I mean that the scheduler estimates that the array
jobs will start on 2021-12-11T03:58:00. No other jobs are scheduled to run
during that time on the other nodes. So it seems that somehow the scheduler
schedules the array jobs on more than one nodes but this is not showing in the
squeue or scontrol output.

My guess is that there is something wrong with either the job
configuration or the node configuration, if Slurm thinks 9 jobs which
require a whole node can all be started simultaneously on same node.

Cheers,

Loris

Regards,

Thekla


On 7/12/21 12:16 μ.μ., Loris Bennett wrote:

Hi Thekla,

Thekla Loizou <t.loi...@cyi.ac.cy> writes:

Dear all,

I have noticed that SLURM schedules several jobs from a job array on the same
node with the same start time and end time.

Each of these jobs requires the full node. You can see the squeue output below:

             JOBID     PARTITION  ST   START_TIME          NODES SCHEDNODES
NODELIST(REASON)

             124841_1       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)
             124841_2       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)
             124841_3       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)
             124841_4       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)
             124841_5       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)
             124841_6       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)
             124841_7       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)
             124841_8       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)
             124841_9       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)

Is this a bug or am I missing something? Is this because the jobs have the same
JOBID and are still in pending state? I am aware that the jobs will not actually
all run on the same node at the same time and that the scheduler somehow takes
into account that this job array has 9 jobs that will need 9 nodes. I am
creating a timeline with the start time of all jobs and when the job array jobs
will start running no other jobs are set to run on the remaining nodes (so it
"saves" the other nodes for the jobs of the array even if they are all scheduled
to run on the same node based on squeue or scontrol).

In general jobs from an array will be scheduled on whatever nodes
fulfil their requirements.  The fact that all the jobs have

     cn06

as NODELIST however seems to suggest that you have either specified cn06
as the node the jobs should run on, or cn06 is the only node which
fulfils the job requirements.

I'm not sure what you mean about '"saving" the other nodes'.

Cheers,

Loris

Re: [slurm-users] Job array start time and SchedNodes

Reply via email to