Hi Simon,
As you defined the h_vmem as "JOB", according to the manual:
"
A consumable defined by 'y' is a per slot consumables which
means the limit is multiplied by the number of slots being
used by the job before being applied. In case of 'j' the
consumable is a per job consumable. This resource is debited
as requested (without multiplication) from the allocated
master queue. The resource needs not be available for the
slave task queues."
I am wondering if you can set it to be "YES", other than "JOB", and to
see if it can work for parallel jobs?
On Mon, Jun 8, 2015 at 11:10 AM, Simon Andrews
<[email protected]> wrote:
> Having done a bit of investigation it seems that the problem we're hitting is
> that our h_vmem limits aren't being respected if the jobs are being submitted
> as parallel jobs.
>
> If I put two jobs in:
>
> $ qsub -o test.log -l h_vmem=1000G hostname
> Your job 343719 ("hostname") has been submitted
>
> $ qsub -o test.log -l h_vmem=1000G -pe cores 2 hostname
> Your job 343720 ("hostname") has been submitted
>
> The first job won't be scheduled:
> scheduling info: cannot run in queue instance
> "[email protected]" because it is not of type batch
> cannot run in queue instance
> "[email protected]" because it is not of type batch
> cannot run in queue instance
> "[email protected]" because it is not of type batch
> cannot run in queue instance
> "[email protected]" because it is not of type batch
> cannot run in queue instance
> "[email protected]" because it is not of type batch
> (-l h_vmem=1000G) cannot run at host
> "compute-0-2.local" because it offers only hc:h_vmem=4.000G
> cannot run in queue instance
> "[email protected]" because it is not of type batch
> cannot run in queue instance
> "[email protected]" because it is not of type batch
> (-l h_vmem=1000G) cannot run at host
> "compute-0-4.local" because it offers only hc:h_vmem=16.000G
> cannot run in queue instance
> "[email protected]" because it is not of type batch
> (-l h_vmem=1000G) cannot run at host
> "compute-0-3.local" because it offers only hc:h_vmem=25.000G
> (-l h_vmem=1000G) cannot run at host
> "compute-0-6.local" because it offers only hc:h_vmem=-968.000G
> (-l h_vmem=1000G) cannot run at host
> "compute-0-5.local" because it offers only hc:h_vmem=32.000G
> (-l h_vmem=1000G) cannot run at host
> "compute-0-0.local" because it offers only hc:h_vmem=32.000G
> (-l h_vmem=1000G) cannot run at host
> "compute-0-1.local" because it offers only hc:h_vmem=12.000G
>
>
> But the second is immediately scheduled and overcommits the node it's on (and
> the overcommit is reflected by qstat -F h_vmem).
>
> The memory usage is recorded and will prevent other jobs from running on that
> node, but I need to figure out how to make the scheduler respect the resource
> limit when the job is first submitted.
>
> Any suggestions would be very welcome
>
> Thanks.
>
> Simon.
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On
> Behalf Of Simon Andrews
> Sent: 08 June 2015 13:53
> To: [email protected]
> Subject: [gridengine users] Negative complex values
>
> Our cluster seems to have ended up in a strange state, and I don't understand
> why.
>
> We have set up h_vmem to be a consumable resource so that users can't exhaust
> the memory on any compute node. This has been working OK and in our tests it
> all seemed to be right, but we've now found that somehow we've ended up with
> nodes with negative amounts of memory remaining.
>
> We only have one queue on the system, all.q.
>
> $ qstat -F h_vmem -q all.q@compute-0-3
> queuename qtype resv/used/tot. load_avg arch
> states
> ---------------------------------------------------------------------------------
> [email protected] BP 0/44/64 13.13 lx26-amd64
> hc:h_vmem=-172.000G
>
> ..so the node is somehow at -172G memory.
>
> The setup for the resource is as follows:
>
> $ qconf -sc | grep h_vmem
> h_vmem h_vmem MEMORY <= YES JOB 0
> 0
>
> We use a jsv to add a default memory allocation to all jobs, and the jobs
> listed all provide an h_vmem condition (see later).
>
> ..the initialisation of the complex value for the node looks OK:
>
> $ qconf -se compute-0-3 | grep complex
> complex_values h_vmem=128G
>
> The problem seems to stem from an individual job which has managed to commit
> a 200G job on a node with only 128G. These are the jobs which are running on
> that node.
>
> qstat -j 341706 | grep "hard resource_list"
> hard resource_list: h_vmem=21474836480
> qstat -j 342549 | grep "hard resource_list"
> hard resource_list: h_vmem=21474836480
> qstat -j 342569 | grep "hard resource_list"
> hard resource_list: h_vmem=21474836480
> qstat -j 343337 | grep "hard resource_list"
> hard resource_list: h_vmem=21474836480
> qstat -j 343367 | grep "hard resource_list"
> hard resource_list: h_vmem=21474836480
> qstat -j 343400 | grep "hard resource_list"
> hard resource_list: h_vmem=200G
>
> We still have jobs which are queued because there is insufficient memory, so
> the limit isn't being completely ignored, but I don't understand how the jobs
> which are currently running were able to be scheduled.
>
> (-l h_vmem=40G) cannot run at host "compute-0-3.local" because it offers only
> hc:h_vmem=-172.000G
>
> Does anyone have any suggestions for how the cluster could have got itself
> into this situation?
>
> Thanks
>
> Simon.
> The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT
> Registered Charity No. 1053902.
> The information transmitted in this email is directed only to the addressee.
> If you received this in error, please contact the sender and delete this
> email from your system. The contents of this e-mail are the views of the
> sender and do not necessarily represent the views of the Babraham Institute.
> Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
> The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT
> Registered Charity No. 1053902.
> The information transmitted in this email is directed only to the addressee.
> If you received this in error, please contact the sender and delete this
> email from your system. The contents of this e-mail are the views of the
> sender and do not necessarily represent the views of the Babraham Institute.
> Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
--
Best,
Feng
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users