Am 02.11.2016 um 21:47 schrieb Joshua Baker-LePain:
> On Wed, 2 Nov 2016 at 11:13am, Reuti wrote
>
>>> Am 02.11.2016 um 18:36 schrieb Joshua Baker-LePain <[email protected]>:
>>>
>>> On our cluster, we have three queues per host, each with as many slots as
>>> the host has physical cores. The queues are configured as follows:
>>>
>>> o lab.q (high priority queue for cluster "owners")
>>> - load_thresholds np_load_avg=1.5
>>> o short.q (for jobs <30 minutes)
>>> - load_thresholds np_load_avg=1.25
>>> o long.q (low priority queue avaialble to all users)
>>> - load_thresholds np_load_avg=0.9
>>>
>>> The theory is that we want long.q to stop accepting jobs when a node is
>>> fully loaded (read: load = physical core count) and short.q to stop
>>> accepting jobs when when a node is 50% overloaded. This has worked well
>>> for a long while.
>>
>> As the load is just the number of eligible processes in the run queue*, it
>> should for sure get at least up to the number of available cores. Did you
>> increase the number of slots for these machines too (also PEs)? What is
>> `uptime` showing? What happens with the reported load, when you run some
>> jobs in the background outside of SGE on these nodes?
Just for the record: to investigate this, I defined a load_thresholds which is
always putting the queue in alarm state besides the one under test. I used our
tmpfree complex for it and entered a value which is beyond the installed disk.
This way, `qstat -explain a` will always give an output, even the values of
other complexes which aren't bypassed are displayed. I got:
$ qstat -explain a -q serial@node29 -s r
queuename qtype resv/used/tot. load_avg arch
states
---------------------------------------------------------------------------------
serial@node29 B 0/0/16 15.75 lx24-em64t a
alarm hl:tmpfree=1842222120k load-threshold=2T
alarm hl:np_load_avg=0.492188 load-threshold=0.5
$ qstat -explain a -q serial@node29 -s r
queuename qtype resv/used/tot. load_avg arch
states
---------------------------------------------------------------------------------
serial@node29 B 0/0/16 15.75 lx24-em64t a
alarm hl:tmpfree=1842222120k load-threshold=2T
alarm hl:np_load_avg= 9.844 load-threshold=0.5
$ qstat -explain a -q serial@node29 -s r
queuename qtype resv/used/tot. load_avg arch
states
---------------------------------------------------------------------------------
serial@node29 B 0/0/16 15.76 lx24-em64t a
alarm hl:tmpfree=1842221988k load-threshold=2T
alarm hl:np_load_avg= 0.246 load-threshold=0.5
for settings of NONE or 20 and 0.5 in the load_scaling of np_load_avg of the
exechost. Looks fine. Hence your np_load_avg=2 should have worked.
-- Reuti
> I don't think I was entirely clear above. We still consider a fully loaded
> node to be one using as many slots as there are *physical* cores. So each
> queue is defined to have as many slots as there are physical cores. Our
> goals with the queues are this:
>
> 1) If a node is running full load of lab.q jobs, long.q should go into
> alarm and not accept any jobs.
>
> 2) That same fully loaded node should accept jobs in short.q until it is
> 50% overloaded, at which time short.q should also go into alarm.
>
> 3) Conversely, if a node is running a full load of long.q jobs, it should
> still accept a full load of lab.q jobs.
>
> As an example, here's a non-hyperthreaded node:
>
> $ qhost -q -h iq116
> iq116 linux-x64 8 9.93 15.6G 4.0G 4.0G
> 196.3M
> lab.q BP 0/8/8
> short.q BP 0/2/8
> long.q BP 0/0/8 a
>
> lab.q is full and short.q is still accepting jobs, but long.q is in alarm, as
> intended. Here's a hyperthreaded node:
>
> $ qhost -q -h msg-id1
> HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT
> MEMUSE SWAPTO SWAPUS
> ----------------------------------------------------------------------------------------------
> global - - - - - - -
> - - -
> msg-id1 lx-amd64 48 2 24 48 24.52 251.6G
> 2.2G 4.0G 0.0
> lab.q BP 0/24/24
> short.q BP 0/0/24
> long.q BP 0/0/24
>
> So even though lab.q is full, long.q isn't in alarm. Here's how that node
> shows up in qconf:
>
> $ qconf -se msg-id1
> hostname msg-id1.ic.ucsf.edu
> load_scaling np_load_avg=2.000000
> complex_values mem_free=256000M
> load_values arch=lx-amd64,num_proc=48,mem_total=257673.273438M, \
> swap_total=4095.996094M,virtual_total=261769.269531M, \
>
> m_topology=SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT,
> \
> m_socket=2,m_core=24,m_thread=48,load_avg=24.520000, \
> load_short=24.490000,load_medium=24.520000, \
> load_long=24.500000,mem_free=255421.792969M, \
> swap_free=4095.996094M,virtual_free=259517.789062M, \
> mem_used=2251.480469M,swap_used=0.000000M, \
> virtual_used=2251.480469M,cpu=50.000000, \
>
> m_topology_inuse=SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT,
> \
> np_load_avg=0.510833,np_load_short=0.510208, \
> np_load_medium=0.510833,np_load_long=0.510417
> processors 48
>
> Given I have both hyperthreaded and non-hyperthreaded nodes, I can't just
> change the value of the queue's np_load_avg load_threshold. I thought
> load_scaling was the answer, but it's not having any effect that I can see.
>
> --
> Joshua Baker-LePain
> QB3 Shared Cluster Sysadmin
> UCSF
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users