Hi Reuti,
This is qhost of one of our compute nodes:
pwbcad@gamma01:~$ qhost -F -h omega-0-9
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO
SWAPUS
-------------------------------------------------------------------------------
global - - - - - -
-
omega-0-9 lx26-amd64 64 12.34 504.9G 273.6G 256.0G
14.6G
hl:arch=lx26-amd64
hl:num_proc=64.000000
hl:mem_total=504.890G
hl:swap_total=256.000G
hl:virtual_total=760.890G
hl:load_avg=12.340000
hl:load_short=9.720000
hl:load_medium=12.340000
hl:load_long=18.900000
hl:mem_free=231.308G
hl:swap_free=241.356G
hl:virtual_free=472.663G
hl:mem_used=273.582G
hl:swap_used=14.644G
hl:virtual_used=288.226G
hl:cpu=15.400000
hl:m_topology=SCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTT
hl:m_topology_inuse=SCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTT
hl:m_socket=4.000000
hl:m_core=32.000000
hl:np_load_avg=0.192812
hl:np_load_short=0.151875
hl:np_load_medium=0.192812
hl:np_load_long=0.295312
hc:mem_requested=502.890G
We do not set h_vmem in queue instance level, that's intended because we
just need h_vmem in per user quota like:
{
name default_per_user
enabled true
description "Each user entitles to resources equivalent to two
nodes"
limit users {*} queues {all.q} to slots=16,h_vmem=16G
}
At the queue instance level, we use mem_requested as "per host quota"
instead. It's a custom complex attr we setup for our specific applications.
Cheers,
D
On Tue, Jul 29, 2014 at 1:02 AM, Reuti <[email protected]> wrote:
> Hi,
>
> Am 04.07.2014 um 06:04 schrieb Derrick Lin:
>
> > Interestingly, I have a small test cluster that basically have the same
> SGE setup does *not* have such problem. h_vmem in complex is exactly the
> same. The test queue instance looks almost the same (except the CPU layout
> etc)
> >
> > qstat -F -q all.q@eva00
> > queuename qtype resv/used/tot. load_avg arch
> states
> >
> ---------------------------------------------------------------------------------
> > [email protected] BP 0/0/8 0.00 lx26-amd64
> > ...
> > hc:mem_requested=7.814G
> > qf:qname=all.q
> > qf:hostname=eva00.local
> > qc:slots=8
> > qf:tmpdir=/tmp
> > qf:seq_no=0
> > qf:rerun=0.000000
> > qf:calendar=NONE
> > qf:s_rt=infinity
> > qf:h_rt=infinity
> > qf:s_cpu=infinity
> > qf:h_cpu=infinity
> > qf:s_fsize=infinity
> > qf:h_fsize=infinity
> > qf:s_data=infinity
> > qf:h_data=infinity
> > qf:s_stack=infinity
> > qf:h_stack=infinity
> > qf:s_core=infinity
> > qf:h_core=infinity
> > qf:s_rss=infinity
> > qf:h_rss=infinity
> > qf:s_vmem=infinity
> > qf:h_vmem=infinity
> > qf:min_cpu_interval=00:05:00
> >
> > Both clusters don't have h_vmem defined in exechost level.
>
> What is the output of:
>
> `qhost -F`
>
> Below you write that it's also defined on a queue instance level, hence in
> both places (as "complex_values")?
>
> -- Reuti
>
>
> > Derrick
> >
> >
> > On Fri, Jul 4, 2014 at 1:58 PM, Derrick Lin <[email protected]> wrote:
> > Hi all,
> >
> > We start using h_vmem to control jobs by their memory usage. However
> jobs couldn't start when there is -l h_vmem. The reason is
> >
> > (-l h_vmem=1G) cannot run in queue "[email protected]" because
> job requests unknown resource (h_vmem)
> >
> > However, h_vmem is definitely on the queue instance:
> >
> > queuename qtype resv/used/tot. load_avg arch
> states
> >
> ---------------------------------------------------------------------------------
> > [email protected] BIP 0/0/64 6.27 lx26-amd64
> > ....
> > hl:np_load_long=0.091563
> > hc:mem_requested=504.903G
> > qf:qname=intel.q
> > qf:hostname=delta-5-1.local
> > qc:slots=64
> > qf:tmpdir=/tmp
> > qf:seq_no=0
> > qf:rerun=0.000000
> > qf:calendar=NONE
> > qf:s_rt=infinity
> > qf:h_rt=infinity
> > qf:s_cpu=infinity
> > qf:h_cpu=infinity
> > qf:s_fsize=infinity
> > qf:h_fsize=infinity
> > qf:s_data=infinity
> > qf:h_data=infinity
> > qf:s_stack=infinity
> > qf:h_stack=infinity
> > qf:s_core=infinity
> > qf:h_core=infinity
> > qf:s_rss=infinity
> > qf:h_rss=infinity
> > qf:s_vmem=infinity
> > qf:h_vmem=infinity
> > qf:min_cpu_interval=00:05:00
> >
> > I tried to specify other attr such as h_rt, jobs started and finished
> successfully.
> >
> >
> >
> >
> > qconf -sc
> >
> >
> >
> > #name shortcut type relop requestable consumable
> default urgency
> >
> >
> >
> >
> #----------------------------------------------------------------------------------------
> >
> >
> >
> > h_vmem h_vmem MEMORY <= YES YES
> 0 0
> >
> >
> >
> > #
> >
> > Can anyone shed light on this?
> >
> > Cheers,
> > Derrick
> >
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users