I finally spotted the cause. In the scheduler conf, I have

params                            MONITOR=1
max_reservation                   5

I think I set it for monitoring resource reservation long time ago. I
reversed the setting back to:

params                            none
max_reservation                   0

Jobs start and run fine. Anyone can explain why these settings are related
to job resource request?

Cheers,
Derrick


On Fri, Jul 4, 2014 at 2:04 PM, Derrick Lin <[email protected]> wrote:

> Interestingly, I have a small test cluster that basically have the same
> SGE setup does *not* have such problem. h_vmem in complex is exactly the
> same. The test queue instance looks almost the same (except the CPU layout
> etc)
>
>  qstat -F -q all.q@eva00
> queuename                      qtype resv/used/tot. load_avg arch
>  states
>
> ---------------------------------------------------------------------------------
> [email protected]              BP    0/0/8          0.00     lx26-amd64
>        ...
>         hc:mem_requested=7.814G
>         qf:qname=all.q
>         qf:hostname=eva00.local
>         qc:slots=8
>         qf:tmpdir=/tmp
>         qf:seq_no=0
>         qf:rerun=0.000000
>         qf:calendar=NONE
>         qf:s_rt=infinity
>         qf:h_rt=infinity
>         qf:s_cpu=infinity
>         qf:h_cpu=infinity
>         qf:s_fsize=infinity
>         qf:h_fsize=infinity
>         qf:s_data=infinity
>         qf:h_data=infinity
>         qf:s_stack=infinity
>         qf:h_stack=infinity
>         qf:s_core=infinity
>         qf:h_core=infinity
>         qf:s_rss=infinity
>         qf:h_rss=infinity
>         qf:s_vmem=infinity
>         qf:h_vmem=infinity
>         qf:min_cpu_interval=00:05:00
>
> Both clusters don't have h_vmem defined in exechost level.
>
> Derrick
>
>
> On Fri, Jul 4, 2014 at 1:58 PM, Derrick Lin <[email protected]> wrote:
>
>> Hi all,
>>
>> We start using h_vmem to control jobs by their memory usage. However jobs
>> couldn't start when there is -l h_vmem. The reason is
>>
>> (-l h_vmem=1G) cannot run in queue "[email protected]" because job
>> requests unknown resource (h_vmem)
>>
>> However, h_vmem is definitely on the queue instance:
>>
>> queuename                      qtype resv/used/tot. load_avg arch
>>  states
>>
>> ---------------------------------------------------------------------------------
>> [email protected]        BIP   0/0/64         6.27     lx26-amd64
>>         ....
>>         hl:np_load_long=0.091563
>>         hc:mem_requested=504.903G
>>         qf:qname=intel.q
>>         qf:hostname=delta-5-1.local
>>         qc:slots=64
>>         qf:tmpdir=/tmp
>>         qf:seq_no=0
>>         qf:rerun=0.000000
>>         qf:calendar=NONE
>>         qf:s_rt=infinity
>>         qf:h_rt=infinity
>>         qf:s_cpu=infinity
>>         qf:h_cpu=infinity
>>         qf:s_fsize=infinity
>>         qf:h_fsize=infinity
>>         qf:s_data=infinity
>>          qf:h_data=infinity
>>         qf:s_stack=infinity
>>         qf:h_stack=infinity
>>         qf:s_core=infinity
>>         qf:h_core=infinity
>>         qf:s_rss=infinity
>>         qf:h_rss=infinity
>>         qf:s_vmem=infinity
>>         qf:h_vmem=infinity
>>         qf:min_cpu_interval=00:05:00
>>
>> I tried to specify other attr such as h_rt, jobs started and finished
>> successfully.
>>
>> qconf -sc
>>  #name               shortcut   type        relop requestable consumable
>> default  urgency
>>
>> #----------------------------------------------------------------------------------------
>>  h_vmem              h_vmem     MEMORY      <=    YES         YES
>> 0        0
>>  #
>>
>> Can anyone shed light on this?
>>
>> Cheers,
>> Derrick
>>
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to