Re: [gridengine users] default value in complex cannot be changed via a user request

Reuti Sat, 11 Apr 2015 02:40:52 -0700

Am 11.04.2015 um 03:05 schrieb Marlies Hankel:

> Dear all,
> 
> Yes, I set a default of 1G value for h_vmem in the global complex.


You mean `qconf -me global`? This is the available memory being available once 
in the complete cluster. The default is in `qconf -mc`.


> The queue has infinity and each node has a h_vmem of 120G. There is nothing 
> set in sge_request.
> 
> Here is the error I get
> 
> 04/09/2015 12:41:54|  main|cpu-1-4|W|job 4 exceeds job hard limit "h_vmem" of 
> queue "[email protected]" (7002386432.00000 > limit:1073741824.00000) - 
> sending SIGKILL
> 04/09/2015 12:41:55|  main|cpu-1-4|W|job 4 exceeds job hard limit "h_vmem" of 
> queue "[email protected]" (6889209856.00000 > limit:1073741824.00000) - 
> sending SIGKILL
> 
> 
> For this I asked for
> #$ -pe openmpi 10
> #$ -l h_vmem=1G
> 
> checking ulimits via the script gives the expected 10G.

So the job started on one node only? What is the complete definition of this PE?

> 
> So for some reason there is a limit of 1G there. Checking qacct it shows the 
> the total maxvmem used is just over 3G so asking for 10G should be plenty.
> 
> [root@queue ~]# qconf -sc
> #name               shortcut   type        relop requestable consumable 
> default  urgency
> #----------------------------------------------------------------------------------------
> h_vmem              h_vmem     MEMORY <=    YES         YES        1G       0

Yep, this 1G should work.

If it persists, maybe it a problem in OGS as I didn't notice it in other forks.

-- Reuti


> [root@queue ~]# qconf -sq all.q
> h_vmem                INFINITY
> 
> [root@queue ~]# qconf -se cpu-1-4.local
> complex_values        h_vmem=120G
> 
> Marlies
> 
> On 04/10/2015 07:43 PM, Reuti wrote:
>>> Am 10.04.2015 um 05:59 schrieb Marlies Hankel<[email protected]>:
>>> 
>>> Dear all,
>>> 
>>> I ran into some trouble with the default value of h_vmem. I set to be 
>>> consumable=yes and also set a default value of 1G. When I submitted a job 
>>> asking for example for 10 slots with 1G per lost the job crashed with an 
>>> error in the queue logs saying that the h_vmem needed by the job (around 
>>> 3G) was over the hard limit of the queue (local host instance) of 1G. I 
>>> would have thought that the request of 1G per slot, so 10G in total would 
>>> override this and give enough memory for the job.
>>> 
>>> Setting the default value to 6G resolved the problem,
>> You refer to the setting on a queue level? This is the limit per process. 
>> There is also a column for the default value in the complex definition for 
>> each consumable complex. This can be set to 1G and users can override it, as 
>> long as they stay below the limit on a queue (or exechost) level.
>> 
>> -- Reuti
>> 
>> 
>>> but as we might be dealing with larger memory jobs in future I would like 
>>> to find a proper fix for this. I am running SGE as installed by ROCKS 6.1.1 
>>> (OGS/Grid Engine 2011.11) and the only thing I changed was to set h_vmem to 
>>> consumable=yes and set the relevant h_vmem values for each host.
>>> 
>>> I want users to request memory and jobs to be killed if they exceed the 
>>> requested amount, so h_vmem seemed to be the way to go. But how do I set a 
>>> small default value that users can change if they need more? Or should I 
>>> set it to forced without a default and force users to request it?
>>> 
>>> Thanks in advance
>>> 
>>> Marlies
>>> 
>>> -- 
>>> 
>>> ------------------
>>> 
>>> Dr. Marlies Hankel
>>> Research Fellow, Theory and Computation Group
>>> Australian Institute for Bioengineering and Nanotechnology (Bldg 75)
>>> eResearch Analyst, Research Computing Centre and Queensland Cyber 
>>> Infrastructure Foundation
>>> The University of Queensland
>>> Qld 4072, Brisbane, Australia
>>> Tel: +61 7 334 63996 | Fax: +61 7 334 63992 | mobile:0404262445
>>> Email: [email protected] | www.theory-computation.uq.edu.au
>>> 
>>> 
>>> Notice: If you receive this e-mail by mistake, please notify me,
>>> and do not make any use of its contents. I do not waive any
>>> privilege, confidentiality or copyright associated with it. Unless
>>> stated otherwise, this e-mail represents only the views of the
>>> Sender and not the views of The University of Queensland.
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
> 
> -- 
> 
> ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
> 
> Please note change of work hours: Monday, Wednesday and Friday
> 
> Dr. Marlies Hankel
> Research Fellow
> High Performance Computing, Quantum Dynamics&  Nanotechnology
> Theory and Computational Molecular Sciences Group
> Room 229 Australian Institute for Bioengineering and Nanotechnology  (75)
> The University of Queensland
> Qld 4072, Brisbane
> Australia
> Tel: +61 (0)7-33463996
> Fax: +61 (0)7-334 63992
> mobile:+61 (0)404262445
> Email: [email protected]
> http://web.aibn.uq.edu.au/cbn/
> 
> ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
> 
> Notice: If you receive this e-mail by mistake, please notify me, and do
> not make any use of its contents. I do not waive any privilege,
> confidentiality or copyright associated with it. Unless stated
> otherwise, this e-mail represents only the views of the Sender and not
> the views of The University of Queensland.
> 
> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] default value in complex cannot be changed via a user request

Reply via email to