I finally spotted the cause. In the scheduler conf, I have params MONITOR=1 max_reservation 5
I think I set it for monitoring resource reservation long time ago. I reversed the setting back to: params none max_reservation 0 Jobs start and run fine. Anyone can explain why these settings are related to job resource request? Cheers, Derrick On Fri, Jul 4, 2014 at 2:04 PM, Derrick Lin <[email protected]> wrote: > Interestingly, I have a small test cluster that basically have the same > SGE setup does *not* have such problem. h_vmem in complex is exactly the > same. The test queue instance looks almost the same (except the CPU layout > etc) > > qstat -F -q all.q@eva00 > queuename qtype resv/used/tot. load_avg arch > states > > --------------------------------------------------------------------------------- > [email protected] BP 0/0/8 0.00 lx26-amd64 > ... > hc:mem_requested=7.814G > qf:qname=all.q > qf:hostname=eva00.local > qc:slots=8 > qf:tmpdir=/tmp > qf:seq_no=0 > qf:rerun=0.000000 > qf:calendar=NONE > qf:s_rt=infinity > qf:h_rt=infinity > qf:s_cpu=infinity > qf:h_cpu=infinity > qf:s_fsize=infinity > qf:h_fsize=infinity > qf:s_data=infinity > qf:h_data=infinity > qf:s_stack=infinity > qf:h_stack=infinity > qf:s_core=infinity > qf:h_core=infinity > qf:s_rss=infinity > qf:h_rss=infinity > qf:s_vmem=infinity > qf:h_vmem=infinity > qf:min_cpu_interval=00:05:00 > > Both clusters don't have h_vmem defined in exechost level. > > Derrick > > > On Fri, Jul 4, 2014 at 1:58 PM, Derrick Lin <[email protected]> wrote: > >> Hi all, >> >> We start using h_vmem to control jobs by their memory usage. However jobs >> couldn't start when there is -l h_vmem. The reason is >> >> (-l h_vmem=1G) cannot run in queue "[email protected]" because job >> requests unknown resource (h_vmem) >> >> However, h_vmem is definitely on the queue instance: >> >> queuename qtype resv/used/tot. load_avg arch >> states >> >> --------------------------------------------------------------------------------- >> [email protected] BIP 0/0/64 6.27 lx26-amd64 >> .... >> hl:np_load_long=0.091563 >> hc:mem_requested=504.903G >> qf:qname=intel.q >> qf:hostname=delta-5-1.local >> qc:slots=64 >> qf:tmpdir=/tmp >> qf:seq_no=0 >> qf:rerun=0.000000 >> qf:calendar=NONE >> qf:s_rt=infinity >> qf:h_rt=infinity >> qf:s_cpu=infinity >> qf:h_cpu=infinity >> qf:s_fsize=infinity >> qf:h_fsize=infinity >> qf:s_data=infinity >> qf:h_data=infinity >> qf:s_stack=infinity >> qf:h_stack=infinity >> qf:s_core=infinity >> qf:h_core=infinity >> qf:s_rss=infinity >> qf:h_rss=infinity >> qf:s_vmem=infinity >> qf:h_vmem=infinity >> qf:min_cpu_interval=00:05:00 >> >> I tried to specify other attr such as h_rt, jobs started and finished >> successfully. >> >> qconf -sc >> #name shortcut type relop requestable consumable >> default urgency >> >> #---------------------------------------------------------------------------------------- >> h_vmem h_vmem MEMORY <= YES YES >> 0 0 >> # >> >> Can anyone shed light on this? >> >> Cheers, >> Derrick >> > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
