Hi,

I had a quick question about the behavior of the OverSubscribe setting on partitions.   

In my setup (17.02.1-2) I have a node that belongs to two partition, and am using select/con_res with CR_Core_memory. 

With OverSubscribe=NO I can submit a job to the same node from both partitions, and both will start execution immediately.  However, trying the same thing with OverSubscribe=YES one of the jobs will go into the PD state until the other finishes.  If I specify the -s flag, both jobs will run concurrently.  According to [0] OverSubscribe=YES should behave the same as =NO by default unless a flag is passed, but I think I’m seeing different behavior.

Here are some outputs illustrating the issue:


[shadosub|04:33 PM]$ scontrol show partition | grep -e '^PartitionName' -e ‘OverSubscribe'
PartitionName=test
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
PartitionName=test2
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO

[shadosub|04:35 PM]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               385      test  test.sh sabobbin  R       0:01      1 shado00
               386     test2  test.sh sabobbin  R       0:01      1 shado00




[shadosub|04:35 PM]$ scontrol show partition | grep -e '^PartitionName' -e ‘OverSubscribe'
PartitionName=test
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=YES:4
PartitionName=test2
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=YES:4

[shadosub|04:45 PM]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               389     test2  test.sh sabobbin PD       0:00      1 (Resources)
               388      test  test.sh sabobbin  R       0:03      1 shado00
[shadosub|04:45 PM]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               390      test  test.sh sabobbin  R       0:02      1 shado00
               391      test  test.sh sabobbin  R       0:02      1 shado00


As shown, the jobs will also execute concurrently if submitted to the same partition. 

To me this seems like a bug, but I wanted to ping the group to see if I’m missing an option or misunderstanding the expected behavior.  


Thanks,
—Shawn




Attachment: slurm.conf
Description: Binary data

Attachment: slurm.info
Description: Binary data


Reply via email to