Dear all,
Yes, I set a default of 1G value for h_vmem in the global complex. The
queue has infinity and each node has a h_vmem of 120G. There is nothing
set in sge_request.
Here is the error I get
04/09/2015 12:41:54| main|cpu-1-4|W|job 4 exceeds job hard limit
"h_vmem" of queue "[email protected]" (7002386432.00000 >
limit:1073741824.00000) - sending SIGKILL
04/09/2015 12:41:55| main|cpu-1-4|W|job 4 exceeds job hard limit
"h_vmem" of queue "[email protected]" (6889209856.00000 >
limit:1073741824.00000) - sending SIGKILL
For this I asked for
#$ -pe openmpi 10
#$ -l h_vmem=1G
checking ulimits via the script gives the expected 10G.
So for some reason there is a limit of 1G there. Checking qacct it shows
the the total maxvmem used is just over 3G so asking for 10G should be
plenty.
[root@queue ~]# qconf -sc
#name shortcut type relop requestable consumable
default urgency
#----------------------------------------------------------------------------------------
h_vmem h_vmem MEMORY <= YES YES
1G 0
[root@queue ~]# qconf -sq all.q
h_vmem INFINITY
[root@queue ~]# qconf -se cpu-1-4.local
complex_values h_vmem=120G
Marlies
On 04/10/2015 07:43 PM, Reuti wrote:
Am 10.04.2015 um 05:59 schrieb Marlies Hankel<[email protected]>:
Dear all,
I ran into some trouble with the default value of h_vmem. I set to be
consumable=yes and also set a default value of 1G. When I submitted a job
asking for example for 10 slots with 1G per lost the job crashed with an error
in the queue logs saying that the h_vmem needed by the job (around 3G) was over
the hard limit of the queue (local host instance) of 1G. I would have thought
that the request of 1G per slot, so 10G in total would override this and give
enough memory for the job.
Setting the default value to 6G resolved the problem,
You refer to the setting on a queue level? This is the limit per process. There
is also a column for the default value in the complex definition for each
consumable complex. This can be set to 1G and users can override it, as long as
they stay below the limit on a queue (or exechost) level.
-- Reuti
but as we might be dealing with larger memory jobs in future I would like to
find a proper fix for this. I am running SGE as installed by ROCKS 6.1.1
(OGS/Grid Engine 2011.11) and the only thing I changed was to set h_vmem to
consumable=yes and set the relevant h_vmem values for each host.
I want users to request memory and jobs to be killed if they exceed the
requested amount, so h_vmem seemed to be the way to go. But how do I set a
small default value that users can change if they need more? Or should I set it
to forced without a default and force users to request it?
Thanks in advance
Marlies
--
------------------
Dr. Marlies Hankel
Research Fellow, Theory and Computation Group
Australian Institute for Bioengineering and Nanotechnology (Bldg 75)
eResearch Analyst, Research Computing Centre and Queensland Cyber
Infrastructure Foundation
The University of Queensland
Qld 4072, Brisbane, Australia
Tel: +61 7 334 63996 | Fax: +61 7 334 63992 | mobile:0404262445
Email: [email protected] | www.theory-computation.uq.edu.au
Notice: If you receive this e-mail by mistake, please notify me,
and do not make any use of its contents. I do not waive any
privilege, confidentiality or copyright associated with it. Unless
stated otherwise, this e-mail represents only the views of the
Sender and not the views of The University of Queensland.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
--
ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
Please note change of work hours: Monday, Wednesday and Friday
Dr. Marlies Hankel
Research Fellow
High Performance Computing, Quantum Dynamics& Nanotechnology
Theory and Computational Molecular Sciences Group
Room 229 Australian Institute for Bioengineering and Nanotechnology (75)
The University of Queensland
Qld 4072, Brisbane
Australia
Tel: +61 (0)7-33463996
Fax: +61 (0)7-334 63992
mobile:+61 (0)404262445
Email: [email protected]
http://web.aibn.uq.edu.au/cbn/
ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
Notice: If you receive this e-mail by mistake, please notify me, and do
not make any use of its contents. I do not waive any privilege,
confidentiality or copyright associated with it. Unless stated
otherwise, this e-mail represents only the views of the Sender and not
the views of The University of Queensland.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users