Am 11.04.2015 um 03:05 schrieb Marlies Hankel: > Dear all, > > Yes, I set a default of 1G value for h_vmem in the global complex.
You mean `qconf -me global`? This is the available memory being available once in the complete cluster. The default is in `qconf -mc`. > The queue has infinity and each node has a h_vmem of 120G. There is nothing > set in sge_request. > > Here is the error I get > > 04/09/2015 12:41:54| main|cpu-1-4|W|job 4 exceeds job hard limit "h_vmem" of > queue "[email protected]" (7002386432.00000 > limit:1073741824.00000) - > sending SIGKILL > 04/09/2015 12:41:55| main|cpu-1-4|W|job 4 exceeds job hard limit "h_vmem" of > queue "[email protected]" (6889209856.00000 > limit:1073741824.00000) - > sending SIGKILL > > > For this I asked for > #$ -pe openmpi 10 > #$ -l h_vmem=1G > > checking ulimits via the script gives the expected 10G. So the job started on one node only? What is the complete definition of this PE? > > So for some reason there is a limit of 1G there. Checking qacct it shows the > the total maxvmem used is just over 3G so asking for 10G should be plenty. > > [root@queue ~]# qconf -sc > #name shortcut type relop requestable consumable > default urgency > #---------------------------------------------------------------------------------------- > h_vmem h_vmem MEMORY <= YES YES 1G 0 Yep, this 1G should work. If it persists, maybe it a problem in OGS as I didn't notice it in other forks. -- Reuti > [root@queue ~]# qconf -sq all.q > h_vmem INFINITY > > [root@queue ~]# qconf -se cpu-1-4.local > complex_values h_vmem=120G > > Marlies > > On 04/10/2015 07:43 PM, Reuti wrote: >>> Am 10.04.2015 um 05:59 schrieb Marlies Hankel<[email protected]>: >>> >>> Dear all, >>> >>> I ran into some trouble with the default value of h_vmem. I set to be >>> consumable=yes and also set a default value of 1G. When I submitted a job >>> asking for example for 10 slots with 1G per lost the job crashed with an >>> error in the queue logs saying that the h_vmem needed by the job (around >>> 3G) was over the hard limit of the queue (local host instance) of 1G. I >>> would have thought that the request of 1G per slot, so 10G in total would >>> override this and give enough memory for the job. >>> >>> Setting the default value to 6G resolved the problem, >> You refer to the setting on a queue level? This is the limit per process. >> There is also a column for the default value in the complex definition for >> each consumable complex. This can be set to 1G and users can override it, as >> long as they stay below the limit on a queue (or exechost) level. >> >> -- Reuti >> >> >>> but as we might be dealing with larger memory jobs in future I would like >>> to find a proper fix for this. I am running SGE as installed by ROCKS 6.1.1 >>> (OGS/Grid Engine 2011.11) and the only thing I changed was to set h_vmem to >>> consumable=yes and set the relevant h_vmem values for each host. >>> >>> I want users to request memory and jobs to be killed if they exceed the >>> requested amount, so h_vmem seemed to be the way to go. But how do I set a >>> small default value that users can change if they need more? Or should I >>> set it to forced without a default and force users to request it? >>> >>> Thanks in advance >>> >>> Marlies >>> >>> -- >>> >>> ------------------ >>> >>> Dr. Marlies Hankel >>> Research Fellow, Theory and Computation Group >>> Australian Institute for Bioengineering and Nanotechnology (Bldg 75) >>> eResearch Analyst, Research Computing Centre and Queensland Cyber >>> Infrastructure Foundation >>> The University of Queensland >>> Qld 4072, Brisbane, Australia >>> Tel: +61 7 334 63996 | Fax: +61 7 334 63992 | mobile:0404262445 >>> Email: [email protected] | www.theory-computation.uq.edu.au >>> >>> >>> Notice: If you receive this e-mail by mistake, please notify me, >>> and do not make any use of its contents. I do not waive any >>> privilege, confidentiality or copyright associated with it. Unless >>> stated otherwise, this e-mail represents only the views of the >>> Sender and not the views of The University of Queensland. >>> >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users > > -- > > ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms > > Please note change of work hours: Monday, Wednesday and Friday > > Dr. Marlies Hankel > Research Fellow > High Performance Computing, Quantum Dynamics& Nanotechnology > Theory and Computational Molecular Sciences Group > Room 229 Australian Institute for Bioengineering and Nanotechnology (75) > The University of Queensland > Qld 4072, Brisbane > Australia > Tel: +61 (0)7-33463996 > Fax: +61 (0)7-334 63992 > mobile:+61 (0)404262445 > Email: [email protected] > http://web.aibn.uq.edu.au/cbn/ > > ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms > > Notice: If you receive this e-mail by mistake, please notify me, and do > not make any use of its contents. I do not waive any privilege, > confidentiality or copyright associated with it. Unless stated > otherwise, this e-mail represents only the views of the Sender and not > the views of The University of Queensland. > > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
