Thanks for replying!

Am I reading that right, that if the resource is allocated per job then it
doesn't actually need to be available?

If that's the case, what is the correct way to set up a job level resource
which we can use for scheduling?  I suppose I could change the resource to
be slot level, not job level, then use our jsv to divide the request by
the number of cores, but that seems kind of awkward.

Is there a better way I'm missing?

Thanks

Simon.


On 08/06/2015 16:38, "Feng Zhang" <[email protected]> wrote:

>Hi Simon,
>
>As you defined the h_vmem as "JOB", according to the manual:
>"
>     A consumable defined by 'y' is a per slot consumables  which
>     means  the  limit is multiplied by the number of slots being
>     used by the job before being applied.  In case  of  'j'  the
>     consumable is a per job consumable. This resource is debited
>     as requested (without  multiplication)  from  the  allocated
>     master  queue.  The  resource needs not be available for the
>     slave task queues."
>
>I am wondering if you can set it to be "YES", other than "JOB", and to
>see  if it can work for parallel jobs?
>
>
>On Mon, Jun 8, 2015 at 11:10 AM, Simon Andrews
><[email protected]> wrote:
>> Having done a bit of investigation it seems that the problem we're
>>hitting is that our h_vmem limits aren't being respected if the jobs are
>>being submitted as parallel jobs.
>>
>> If I put two jobs in:
>>
>> $ qsub -o test.log -l h_vmem=1000G hostname
>> Your job 343719 ("hostname") has been submitted
>>
>> $ qsub -o test.log -l h_vmem=1000G -pe cores 2 hostname
>> Your job 343720 ("hostname") has been submitted
>>
>> The first job won't be scheduled:
>> scheduling info:            cannot run in queue instance
>>"[email protected]" because it is not of type batch
>>                             cannot run in queue instance
>>"[email protected]" because it is not of type batch
>>                             cannot run in queue instance
>>"[email protected]" because it is not of type batch
>>                             cannot run in queue instance
>>"[email protected]" because it is not of type batch
>>                             cannot run in queue instance
>>"[email protected]" because it is not of type batch
>>                             (-l h_vmem=1000G) cannot run at host
>>"compute-0-2.local" because it offers only hc:h_vmem=4.000G
>>                             cannot run in queue instance
>>"[email protected]" because it is not of type batch
>>                             cannot run in queue instance
>>"[email protected]" because it is not of type batch
>>                             (-l h_vmem=1000G) cannot run at host
>>"compute-0-4.local" because it offers only hc:h_vmem=16.000G
>>                             cannot run in queue instance
>>"[email protected]" because it is not of type batch
>>                             (-l h_vmem=1000G) cannot run at host
>>"compute-0-3.local" because it offers only hc:h_vmem=25.000G
>>                             (-l h_vmem=1000G) cannot run at host
>>"compute-0-6.local" because it offers only hc:h_vmem=-968.000G
>>                             (-l h_vmem=1000G) cannot run at host
>>"compute-0-5.local" because it offers only hc:h_vmem=32.000G
>>                             (-l h_vmem=1000G) cannot run at host
>>"compute-0-0.local" because it offers only hc:h_vmem=32.000G
>>                             (-l h_vmem=1000G) cannot run at host
>>"compute-0-1.local" because it offers only hc:h_vmem=12.000G
>>
>>
>> But the second is immediately scheduled and overcommits the node it's
>>on (and the overcommit is reflected by qstat -F h_vmem).
>>
>> The memory usage is recorded and will prevent other jobs from running
>>on that node, but I need to figure out how to make the scheduler respect
>>the resource limit when the job is first submitted.
>>
>> Any suggestions would be very welcome
>>
>> Thanks.
>>
>> Simon.
>>
>> -----Original Message-----
>> From: [email protected]
>>[mailto:[email protected]] On Behalf Of Simon Andrews
>> Sent: 08 June 2015 13:53
>> To: [email protected]
>> Subject: [gridengine users] Negative complex values
>>
>> Our cluster seems to have ended up in a strange state, and I don't
>>understand why.
>>
>> We have set up h_vmem to be a consumable resource so that users can't
>>exhaust the memory on any compute node.  This has been working OK and in
>>our tests it all seemed to be right, but we've now found that somehow
>>we've ended up with nodes with negative amounts of memory remaining.
>>
>> We only have one queue on the system, all.q.
>>
>> $ qstat -F h_vmem -q all.q@compute-0-3
>> queuename                      qtype resv/used/tot. load_avg arch
>>   states
>>
>>-------------------------------------------------------------------------
>>--------
>> [email protected]        BP    0/44/64        13.13    lx26-amd64
>>         hc:h_vmem=-172.000G
>>
>> ..so the node is somehow at -172G memory.
>>
>> The setup for the resource is as follows:
>>
>> $ qconf -sc | grep h_vmem
>> h_vmem              h_vmem     MEMORY      <=    YES         JOB
>>0        0
>>
>> We use a jsv to add a default memory allocation to all jobs, and the
>>jobs listed all provide an h_vmem condition (see later).
>>
>> ..the initialisation of the complex value for the node looks OK:
>>
>> $ qconf -se compute-0-3 | grep complex
>> complex_values        h_vmem=128G
>>
>> The problem seems to stem from an individual job which has managed to
>>commit a 200G job on a node with only 128G. These are the jobs which are
>>running on that node.
>>
>> qstat -j 341706 | grep "hard resource_list"
>> hard resource_list:         h_vmem=21474836480
>> qstat -j 342549 | grep "hard resource_list"
>> hard resource_list:         h_vmem=21474836480
>> qstat -j 342569 | grep "hard resource_list"
>> hard resource_list:         h_vmem=21474836480
>> qstat -j 343337 | grep "hard resource_list"
>> hard resource_list:         h_vmem=21474836480
>> qstat -j 343367 | grep "hard resource_list"
>> hard resource_list:         h_vmem=21474836480
>> qstat -j 343400 | grep "hard resource_list"
>> hard resource_list:         h_vmem=200G
>>
>> We still have jobs which are queued because there is insufficient
>>memory, so the limit isn't being completely ignored, but I don't
>>understand how the jobs which are currently running were able to be
>>scheduled.
>>
>> (-l h_vmem=40G) cannot run at host "compute-0-3.local" because it
>>offers only hc:h_vmem=-172.000G
>>
>> Does anyone have any suggestions for how the cluster could have got
>>itself into this situation?
>>
>> Thanks
>>
>> Simon.
>> The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT
>>Registered Charity No. 1053902.
>> The information transmitted in this email is directed only to the
>>addressee. If you received this in error, please contact the sender and
>>delete this email from your system. The contents of this e-mail are the
>>views of the sender and do not necessarily represent the views of the
>>Babraham Institute. Full conditions at:
>>www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>> The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT
>>Registered Charity No. 1053902.
>> The information transmitted in this email is directed only to the
>>addressee. If you received this in error, please contact the sender and
>>delete this email from your system. The contents of this e-mail are the
>>views of the sender and do not necessarily represent the views of the
>>Babraham Institute. Full conditions at:
>>www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>
>
>
>--
>Best,
>
>Feng

The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered 
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If 
you received this in error, please contact the sender and delete this email 
from your system. The contents of this e-mail are the views of the sender and 
do not necessarily represent the views of the Babraham Institute. Full 
conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to