Hi Simon,

As you defined the h_vmem as "JOB", according to the manual:
"
     A consumable defined by 'y' is a per slot consumables  which
     means  the  limit is multiplied by the number of slots being
     used by the job before being applied.  In case  of  'j'  the
     consumable is a per job consumable. This resource is debited
     as requested (without  multiplication)  from  the  allocated
     master  queue.  The  resource needs not be available for the
     slave task queues."

I am wondering if you can set it to be "YES", other than "JOB", and to
see  if it can work for parallel jobs?


On Mon, Jun 8, 2015 at 11:10 AM, Simon Andrews
<[email protected]> wrote:
> Having done a bit of investigation it seems that the problem we're hitting is 
> that our h_vmem limits aren't being respected if the jobs are being submitted 
> as parallel jobs.
>
> If I put two jobs in:
>
> $ qsub -o test.log -l h_vmem=1000G hostname
> Your job 343719 ("hostname") has been submitted
>
> $ qsub -o test.log -l h_vmem=1000G -pe cores 2 hostname
> Your job 343720 ("hostname") has been submitted
>
> The first job won't be scheduled:
> scheduling info:            cannot run in queue instance 
> "[email protected]" because it is not of type batch
>                             cannot run in queue instance 
> "[email protected]" because it is not of type batch
>                             cannot run in queue instance 
> "[email protected]" because it is not of type batch
>                             cannot run in queue instance 
> "[email protected]" because it is not of type batch
>                             cannot run in queue instance 
> "[email protected]" because it is not of type batch
>                             (-l h_vmem=1000G) cannot run at host 
> "compute-0-2.local" because it offers only hc:h_vmem=4.000G
>                             cannot run in queue instance 
> "[email protected]" because it is not of type batch
>                             cannot run in queue instance 
> "[email protected]" because it is not of type batch
>                             (-l h_vmem=1000G) cannot run at host 
> "compute-0-4.local" because it offers only hc:h_vmem=16.000G
>                             cannot run in queue instance 
> "[email protected]" because it is not of type batch
>                             (-l h_vmem=1000G) cannot run at host 
> "compute-0-3.local" because it offers only hc:h_vmem=25.000G
>                             (-l h_vmem=1000G) cannot run at host 
> "compute-0-6.local" because it offers only hc:h_vmem=-968.000G
>                             (-l h_vmem=1000G) cannot run at host 
> "compute-0-5.local" because it offers only hc:h_vmem=32.000G
>                             (-l h_vmem=1000G) cannot run at host 
> "compute-0-0.local" because it offers only hc:h_vmem=32.000G
>                             (-l h_vmem=1000G) cannot run at host 
> "compute-0-1.local" because it offers only hc:h_vmem=12.000G
>
>
> But the second is immediately scheduled and overcommits the node it's on (and 
> the overcommit is reflected by qstat -F h_vmem).
>
> The memory usage is recorded and will prevent other jobs from running on that 
> node, but I need to figure out how to make the scheduler respect the resource 
> limit when the job is first submitted.
>
> Any suggestions would be very welcome
>
> Thanks.
>
> Simon.
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On 
> Behalf Of Simon Andrews
> Sent: 08 June 2015 13:53
> To: [email protected]
> Subject: [gridengine users] Negative complex values
>
> Our cluster seems to have ended up in a strange state, and I don't understand 
> why.
>
> We have set up h_vmem to be a consumable resource so that users can't exhaust 
> the memory on any compute node.  This has been working OK and in our tests it 
> all seemed to be right, but we've now found that somehow we've ended up with 
> nodes with negative amounts of memory remaining.
>
> We only have one queue on the system, all.q.
>
> $ qstat -F h_vmem -q all.q@compute-0-3
> queuename                      qtype resv/used/tot. load_avg arch          
> states
> ---------------------------------------------------------------------------------
> [email protected]        BP    0/44/64        13.13    lx26-amd64
>         hc:h_vmem=-172.000G
>
> ..so the node is somehow at -172G memory.
>
> The setup for the resource is as follows:
>
> $ qconf -sc | grep h_vmem
> h_vmem              h_vmem     MEMORY      <=    YES         JOB        0     
>    0
>
> We use a jsv to add a default memory allocation to all jobs, and the jobs 
> listed all provide an h_vmem condition (see later).
>
> ..the initialisation of the complex value for the node looks OK:
>
> $ qconf -se compute-0-3 | grep complex
> complex_values        h_vmem=128G
>
> The problem seems to stem from an individual job which has managed to commit 
> a 200G job on a node with only 128G. These are the jobs which are running on 
> that node.
>
> qstat -j 341706 | grep "hard resource_list"
> hard resource_list:         h_vmem=21474836480
> qstat -j 342549 | grep "hard resource_list"
> hard resource_list:         h_vmem=21474836480
> qstat -j 342569 | grep "hard resource_list"
> hard resource_list:         h_vmem=21474836480
> qstat -j 343337 | grep "hard resource_list"
> hard resource_list:         h_vmem=21474836480
> qstat -j 343367 | grep "hard resource_list"
> hard resource_list:         h_vmem=21474836480
> qstat -j 343400 | grep "hard resource_list"
> hard resource_list:         h_vmem=200G
>
> We still have jobs which are queued because there is insufficient memory, so 
> the limit isn't being completely ignored, but I don't understand how the jobs 
> which are currently running were able to be scheduled.
>
> (-l h_vmem=40G) cannot run at host "compute-0-3.local" because it offers only 
> hc:h_vmem=-172.000G
>
> Does anyone have any suggestions for how the cluster could have got itself 
> into this situation?
>
> Thanks
>
> Simon.
> The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT 
> Registered Charity No. 1053902.
> The information transmitted in this email is directed only to the addressee. 
> If you received this in error, please contact the sender and delete this 
> email from your system. The contents of this e-mail are the views of the 
> sender and do not necessarily represent the views of the Babraham Institute. 
> Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
> The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT 
> Registered Charity No. 1053902.
> The information transmitted in this email is directed only to the addressee. 
> If you received this in error, please contact the sender and delete this 
> email from your system. The contents of this e-mail are the views of the 
> sender and do not necessarily represent the views of the Babraham Institute. 
> Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users



-- 
Best,

Feng

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to