Having done a bit of investigation it seems that the problem we're hitting is
that our h_vmem limits aren't being respected if the jobs are being submitted
as parallel jobs.
If I put two jobs in:
$ qsub -o test.log -l h_vmem=1000G hostname
Your job 343719 ("hostname") has been submitted
$ qsub -o test.log -l h_vmem=1000G -pe cores 2 hostname
Your job 343720 ("hostname") has been submitted
The first job won't be scheduled:
scheduling info: cannot run in queue instance
"[email protected]" because it is not of type batch
cannot run in queue instance
"[email protected]" because it is not of type batch
cannot run in queue instance
"[email protected]" because it is not of type batch
cannot run in queue instance
"[email protected]" because it is not of type batch
cannot run in queue instance
"[email protected]" because it is not of type batch
(-l h_vmem=1000G) cannot run at host
"compute-0-2.local" because it offers only hc:h_vmem=4.000G
cannot run in queue instance
"[email protected]" because it is not of type batch
cannot run in queue instance
"[email protected]" because it is not of type batch
(-l h_vmem=1000G) cannot run at host
"compute-0-4.local" because it offers only hc:h_vmem=16.000G
cannot run in queue instance
"[email protected]" because it is not of type batch
(-l h_vmem=1000G) cannot run at host
"compute-0-3.local" because it offers only hc:h_vmem=25.000G
(-l h_vmem=1000G) cannot run at host
"compute-0-6.local" because it offers only hc:h_vmem=-968.000G
(-l h_vmem=1000G) cannot run at host
"compute-0-5.local" because it offers only hc:h_vmem=32.000G
(-l h_vmem=1000G) cannot run at host
"compute-0-0.local" because it offers only hc:h_vmem=32.000G
(-l h_vmem=1000G) cannot run at host
"compute-0-1.local" because it offers only hc:h_vmem=12.000G
But the second is immediately scheduled and overcommits the node it's on (and
the overcommit is reflected by qstat -F h_vmem).
The memory usage is recorded and will prevent other jobs from running on that
node, but I need to figure out how to make the scheduler respect the resource
limit when the job is first submitted.
Any suggestions would be very welcome
Thanks.
Simon.
-----Original Message-----
From: [email protected] [mailto:[email protected]] On
Behalf Of Simon Andrews
Sent: 08 June 2015 13:53
To: [email protected]
Subject: [gridengine users] Negative complex values
Our cluster seems to have ended up in a strange state, and I don't understand
why.
We have set up h_vmem to be a consumable resource so that users can't exhaust
the memory on any compute node. This has been working OK and in our tests it
all seemed to be right, but we've now found that somehow we've ended up with
nodes with negative amounts of memory remaining.
We only have one queue on the system, all.q.
$ qstat -F h_vmem -q all.q@compute-0-3
queuename qtype resv/used/tot. load_avg arch
states
---------------------------------------------------------------------------------
[email protected] BP 0/44/64 13.13 lx26-amd64
hc:h_vmem=-172.000G
..so the node is somehow at -172G memory.
The setup for the resource is as follows:
$ qconf -sc | grep h_vmem
h_vmem h_vmem MEMORY <= YES JOB 0
0
We use a jsv to add a default memory allocation to all jobs, and the jobs
listed all provide an h_vmem condition (see later).
..the initialisation of the complex value for the node looks OK:
$ qconf -se compute-0-3 | grep complex
complex_values h_vmem=128G
The problem seems to stem from an individual job which has managed to commit a
200G job on a node with only 128G. These are the jobs which are running on that
node.
qstat -j 341706 | grep "hard resource_list"
hard resource_list: h_vmem=21474836480
qstat -j 342549 | grep "hard resource_list"
hard resource_list: h_vmem=21474836480
qstat -j 342569 | grep "hard resource_list"
hard resource_list: h_vmem=21474836480
qstat -j 343337 | grep "hard resource_list"
hard resource_list: h_vmem=21474836480
qstat -j 343367 | grep "hard resource_list"
hard resource_list: h_vmem=21474836480
qstat -j 343400 | grep "hard resource_list"
hard resource_list: h_vmem=200G
We still have jobs which are queued because there is insufficient memory, so
the limit isn't being completely ignored, but I don't understand how the jobs
which are currently running were able to be scheduled.
(-l h_vmem=40G) cannot run at host "compute-0-3.local" because it offers only
hc:h_vmem=-172.000G
Does anyone have any suggestions for how the cluster could have got itself into
this situation?
Thanks
Simon.
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If
you received this in error, please contact the sender and delete this email
from your system. The contents of this e-mail are the views of the sender and
do not necessarily represent the views of the Babraham Institute. Full
conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If
you received this in error, please contact the sender and delete this email
from your system. The contents of this e-mail are the views of the sender and
do not necessarily represent the views of the Babraham Institute. Full
conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users