Am 01.08.2014 um 01:39 schrieb Derrick Lin: > Do you have > > params MONITOR=1??
No: $ qconf -ssconf ... params none > This is what gave me the same error. IMO it's not an error. If there is nothing to consume from, then the job shouldn't be scheduled. Is the job also running for you when you: - remove the RQS - h_vmem set to consumable - nowhere any initial value for h_vmem is set (exechost, queue or global)? -- Reuti > I am running GE 6.2u5 as well > > D > > > On Thu, Jul 31, 2014 at 8:53 PM, Reuti <[email protected]> wrote: > Am 31.07.2014 um 03:06 schrieb Derrick Lin: > > > Hi Reuti, > > > > That's interesting, but it works without any hack: > > > > { > > name default_per_user > > enabled true > > description "Each user entitles to resources equivalent to > > three nodes" > > limit users {*} queues {all.q} to slots=192,h_vmem=1536G > > Not for me in 6.2u5, ist shows "because job requests unknown resource > (h_vmem)" as expected until I add a decent value to "complex_values" anywhere. > > -- Reuti > > > > } > > > > Then it consumes from user's quota: > > > > $ qquota -u "*" > > resource quota rule limit filter > > -------------------------------------------------------------------------------- > > default_per_user/1 slots=166/192 users b****** queues all.q > > default_per_user/1 h_vmem=400.000G/1536 users b****** queues all.q > > > > Is it illegal to set h_vmem in per user quota in the first place? > > > > Cheers, > > D > > > > > > On Wed, Jul 30, 2014 at 4:37 PM, Reuti <[email protected]> wrote: > > Hi, > > > > Am 30.07.2014 um 03:29 schrieb Derrick Lin: > > > > > **No** initial value per queue instance, I force the users to specify > > > both h_vmem and mem_requested by defining default values inside > > > sge_default file. > > > > > > No h_vmem on exechost level either, because we want to use mem_requested > > > instead since it's already been setup across all exechosts. > > > > > > My original issue was, when I set params MONITOR=1 jobs failed to start. > > > > > > Now I have MONITOR=1 removed, all jobs start and run fine. Any idea? > > > > They still shouldn't start. As you defined "h_vmem" as being consumable, > > it's a question: consume from what? > > > > Nevertheless you can set an arbitrary high value in the global exechost > > `qconf -me global` there under "complex_values". > > > > -- Reuti > > > > > > > D > > > > > > > > > On Tue, Jul 29, 2014 at 7:43 PM, Reuti <[email protected]> wrote: > > > Hi, > > > > > > Am 29.07.2014 um 06:07 schrieb Derrick Lin: > > > > > > > This is qhost of one of our compute nodes: > > > > > > > > pwbcad@gamma01:~$ qhost -F -h omega-0-9 > > > > HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO > > > > SWAPUS > > > > ------------------------------------------------------------------------------- > > > > global - - - - - - > > > > - > > > > omega-0-9 lx26-amd64 64 12.34 504.9G 273.6G 256.0G > > > > 14.6G > > > > hl:arch=lx26-amd64 > > > > hl:num_proc=64.000000 > > > > hl:mem_total=504.890G > > > > hl:swap_total=256.000G > > > > hl:virtual_total=760.890G > > > > hl:load_avg=12.340000 > > > > hl:load_short=9.720000 > > > > hl:load_medium=12.340000 > > > > hl:load_long=18.900000 > > > > hl:mem_free=231.308G > > > > hl:swap_free=241.356G > > > > hl:virtual_free=472.663G > > > > hl:mem_used=273.582G > > > > hl:swap_used=14.644G > > > > hl:virtual_used=288.226G > > > > hl:cpu=15.400000 > > > > > > > > hl:m_topology=SCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTT > > > > > > > > hl:m_topology_inuse=SCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTT > > > > hl:m_socket=4.000000 > > > > hl:m_core=32.000000 > > > > hl:np_load_avg=0.192812 > > > > hl:np_load_short=0.151875 > > > > hl:np_load_medium=0.192812 > > > > hl:np_load_long=0.295312 > > > > hc:mem_requested=502.890G > > > > > > So, here is no h_vmem on an exechost level. > > > > > > > > > > We do not set h_vmem in queue instance level, that's intended because > > > > we just need h_vmem in per user quota like: > > > > > > Typo and you mean exechost level? > > > > > > > > > > { > > > > name default_per_user > > > > enabled true > > > > description "Each user entitles to resources equivalent to > > > > two nodes" > > > > limit users {*} queues {all.q} to slots=16,h_vmem=16G > > > > } > > > > > > RQS limits are not enforced. The user has to specify it by hand then with > > > the -l option to `qsub`. > > > > > > Is "h_vmem" then in "complex_values" in the queue definition with an > > > initial value per queue instance? > > > > > > -- Reuti > > > > > > > > > > At the queue instance level, we use mem_requested as "per host quota" > > > > instead. It's a custom complex attr we setup for our specific > > > > applications. > > > > > > > > Cheers, > > > > D > > > > > > > > > > > > On Tue, Jul 29, 2014 at 1:02 AM, Reuti <[email protected]> > > > > wrote: > > > > Hi, > > > > > > > > Am 04.07.2014 um 06:04 schrieb Derrick Lin: > > > > > > > > > Interestingly, I have a small test cluster that basically have the > > > > > same SGE setup does *not* have such problem. h_vmem in complex is > > > > > exactly the same. The test queue instance looks almost the same > > > > > (except the CPU layout etc) > > > > > > > > > > qstat -F -q all.q@eva00 > > > > > queuename qtype resv/used/tot. load_avg arch > > > > > states > > > > > --------------------------------------------------------------------------------- > > > > > [email protected] BP 0/0/8 0.00 > > > > > lx26-amd64 > > > > > ... > > > > > hc:mem_requested=7.814G > > > > > qf:qname=all.q > > > > > qf:hostname=eva00.local > > > > > qc:slots=8 > > > > > qf:tmpdir=/tmp > > > > > qf:seq_no=0 > > > > > qf:rerun=0.000000 > > > > > qf:calendar=NONE > > > > > qf:s_rt=infinity > > > > > qf:h_rt=infinity > > > > > qf:s_cpu=infinity > > > > > qf:h_cpu=infinity > > > > > qf:s_fsize=infinity > > > > > qf:h_fsize=infinity > > > > > qf:s_data=infinity > > > > > qf:h_data=infinity > > > > > qf:s_stack=infinity > > > > > qf:h_stack=infinity > > > > > qf:s_core=infinity > > > > > qf:h_core=infinity > > > > > qf:s_rss=infinity > > > > > qf:h_rss=infinity > > > > > qf:s_vmem=infinity > > > > > qf:h_vmem=infinity > > > > > qf:min_cpu_interval=00:05:00 > > > > > > > > > > Both clusters don't have h_vmem defined in exechost level. > > > > > > > > What is the output of: > > > > > > > > `qhost -F` > > > > > > > > Below you write that it's also defined on a queue instance level, hence > > > > in both places (as "complex_values")? > > > > > > > > -- Reuti > > > > > > > > > > > > > Derrick > > > > > > > > > > > > > > > On Fri, Jul 4, 2014 at 1:58 PM, Derrick Lin <[email protected]> wrote: > > > > > Hi all, > > > > > > > > > > We start using h_vmem to control jobs by their memory usage. However > > > > > jobs couldn't start when there is -l h_vmem. The reason is > > > > > > > > > > (-l h_vmem=1G) cannot run in queue "[email protected]" because > > > > > job requests unknown resource (h_vmem) > > > > > > > > > > However, h_vmem is definitely on the queue instance: > > > > > > > > > > queuename qtype resv/used/tot. load_avg arch > > > > > states > > > > > --------------------------------------------------------------------------------- > > > > > [email protected] BIP 0/0/64 6.27 > > > > > lx26-amd64 > > > > > .... > > > > > hl:np_load_long=0.091563 > > > > > hc:mem_requested=504.903G > > > > > qf:qname=intel.q > > > > > qf:hostname=delta-5-1.local > > > > > qc:slots=64 > > > > > qf:tmpdir=/tmp > > > > > qf:seq_no=0 > > > > > qf:rerun=0.000000 > > > > > qf:calendar=NONE > > > > > qf:s_rt=infinity > > > > > qf:h_rt=infinity > > > > > qf:s_cpu=infinity > > > > > qf:h_cpu=infinity > > > > > qf:s_fsize=infinity > > > > > qf:h_fsize=infinity > > > > > qf:s_data=infinity > > > > > qf:h_data=infinity > > > > > qf:s_stack=infinity > > > > > qf:h_stack=infinity > > > > > qf:s_core=infinity > > > > > qf:h_core=infinity > > > > > qf:s_rss=infinity > > > > > qf:h_rss=infinity > > > > > qf:s_vmem=infinity > > > > > qf:h_vmem=infinity > > > > > qf:min_cpu_interval=00:05:00 > > > > > > > > > > I tried to specify other attr such as h_rt, jobs started and finished > > > > > successfully. > > > > > > > > > > > > > > > > > > > > > > > > > qconf -sc > > > > > > > > > > > > > > > > > > > > #name shortcut type relop requestable > > > > > consumable default urgency > > > > > > > > > > > > > > > > > > > > #---------------------------------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > h_vmem h_vmem MEMORY <= YES YES > > > > > 0 0 > > > > > > > > > > > > > > > > > > > > # > > > > > > > > > > Can anyone shed light on this? > > > > > > > > > > Cheers, > > > > > Derrick > > > > > > > > > > _______________________________________________ > > > > > users mailing list > > > > > [email protected] > > > > > https://gridengine.org/mailman/listinfo/users > > > > > > > > > > > > > > > > > > > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
