[slurm-dev] OverSubscribe=YES default behavior.

Shawn Bobbin Thu, 06 Apr 2017 15:36:55 -0700

Hi,

I had a quick question about the behavior of the OverSubscribe setting on partitions.

In my setup (17.02.1-2) I have a node that belongs to two partition, and am using select/con_res with CR_Core_memory.

With OverSubscribe=NO I can submit a job to the same node from both partitions, and both will start execution immediately. However, trying the same thing with OverSubscribe=YES one of the jobs will go into the PD state until the other finishes. If I specify the -s flag, both jobs will run concurrently. According to [0] OverSubscribe=YES should behave the same as =NO by default unless a flag is passed, but I think I’m seeing different behavior.

Here are some outputs illustrating the issue:

[shadosub|04:33 PM]$ scontrol show partition | grep -e '^PartitionName' -e ‘OverSubscribe'

PartitionName=test

   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO

PartitionName=test2

   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO

[shadosub|04:35 PM]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               385      test  test.sh sabobbin  R       0:01      1 shado00
               386     test2  test.sh sabobbin  R       0:01      1 shado00




[shadosub|04:35 PM]$ scontrol show partition | grep -e '^PartitionName' -e ‘OverSubscribe'
PartitionName=test
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=YES:4
PartitionName=test2
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=YES:4

[shadosub|04:45 PM]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               389     test2  test.sh sabobbin PD       0:00      1 (Resources)
               388      test  test.sh sabobbin  R       0:03      1 shado00
[shadosub|04:45 PM]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               390      test  test.sh sabobbin  R       0:02      1 shado00
               391      test  test.sh sabobbin  R       0:02      1 shado00

As shown, the jobs will also execute concurrently if submitted to the same partition.

To me this seems like a bug, but I wanted to ping the group to see if I’m missing an option or misunderstanding the expected behavior.

Thanks,

—Shawn

[0] https://slurm.schedmd.com/cons_res_share.html

slurm.conf
Description: Binary data

slurm.info
Description: Binary data

[slurm-dev] OverSubscribe=YES default behavior.

Reply via email to