IIRC the JOB vs YES for h_vmem for parallel vs batch is a "known issue".

So we modified our environment back to "YES" instead of "JOB" and adjusted the rest of the environment appropriately, e.g. your jsv

So our h_vmem requests are all "per slot".


On 6/8/15 9:00 AM, Simon Andrews wrote:
Thanks for replying!

Am I reading that right, that if the resource is allocated per job then it
doesn't actually need to be available?

If that's the case, what is the correct way to set up a job level resource
which we can use for scheduling?  I suppose I could change the resource to
be slot level, not job level, then use our jsv to divide the request by
the number of cores, but that seems kind of awkward.

Is there a better way I'm missing?

Thanks

Simon.


On 08/06/2015 16:38, "Feng Zhang" <[email protected]> wrote:

Hi Simon,

As you defined the h_vmem as "JOB", according to the manual:
"
     A consumable defined by 'y' is a per slot consumables  which
     means  the  limit is multiplied by the number of slots being
     used by the job before being applied.  In case  of  'j'  the
     consumable is a per job consumable. This resource is debited
     as requested (without  multiplication)  from  the  allocated
     master  queue.  The  resource needs not be available for the
     slave task queues."

I am wondering if you can set it to be "YES", other than "JOB", and to
see  if it can work for parallel jobs?


On Mon, Jun 8, 2015 at 11:10 AM, Simon Andrews
<[email protected]> wrote:
Having done a bit of investigation it seems that the problem we're
hitting is that our h_vmem limits aren't being respected if the jobs are
being submitted as parallel jobs.

If I put two jobs in:

$ qsub -o test.log -l h_vmem=1000G hostname
Your job 343719 ("hostname") has been submitted

$ qsub -o test.log -l h_vmem=1000G -pe cores 2 hostname
Your job 343720 ("hostname") has been submitted

The first job won't be scheduled:
scheduling info:            cannot run in queue instance
"[email protected]" because it is not of type batch
                             cannot run in queue instance
"[email protected]" because it is not of type batch
                             cannot run in queue instance
"[email protected]" because it is not of type batch
                             cannot run in queue instance
"[email protected]" because it is not of type batch
                             cannot run in queue instance
"[email protected]" because it is not of type batch
                             (-l h_vmem=1000G) cannot run at host
"compute-0-2.local" because it offers only hc:h_vmem=4.000G
                             cannot run in queue instance
"[email protected]" because it is not of type batch
                             cannot run in queue instance
"[email protected]" because it is not of type batch
                             (-l h_vmem=1000G) cannot run at host
"compute-0-4.local" because it offers only hc:h_vmem=16.000G
                             cannot run in queue instance
"[email protected]" because it is not of type batch
                             (-l h_vmem=1000G) cannot run at host
"compute-0-3.local" because it offers only hc:h_vmem=25.000G
                             (-l h_vmem=1000G) cannot run at host
"compute-0-6.local" because it offers only hc:h_vmem=-968.000G
                             (-l h_vmem=1000G) cannot run at host
"compute-0-5.local" because it offers only hc:h_vmem=32.000G
                             (-l h_vmem=1000G) cannot run at host
"compute-0-0.local" because it offers only hc:h_vmem=32.000G
                             (-l h_vmem=1000G) cannot run at host
"compute-0-1.local" because it offers only hc:h_vmem=12.000G


But the second is immediately scheduled and overcommits the node it's
on (and the overcommit is reflected by qstat -F h_vmem).

The memory usage is recorded and will prevent other jobs from running
on that node, but I need to figure out how to make the scheduler respect
the resource limit when the job is first submitted.

Any suggestions would be very welcome

Thanks.

Simon.

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Simon Andrews
Sent: 08 June 2015 13:53
To: [email protected]
Subject: [gridengine users] Negative complex values

Our cluster seems to have ended up in a strange state, and I don't
understand why.

We have set up h_vmem to be a consumable resource so that users can't
exhaust the memory on any compute node.  This has been working OK and in
our tests it all seemed to be right, but we've now found that somehow
we've ended up with nodes with negative amounts of memory remaining.

We only have one queue on the system, all.q.

$ qstat -F h_vmem -q all.q@compute-0-3
queuename                      qtype resv/used/tot. load_avg arch
   states

-------------------------------------------------------------------------
--------
[email protected]        BP    0/44/64        13.13    lx26-amd64
         hc:h_vmem=-172.000G

..so the node is somehow at -172G memory.

The setup for the resource is as follows:

$ qconf -sc | grep h_vmem
h_vmem              h_vmem     MEMORY      <=    YES         JOB
0        0

We use a jsv to add a default memory allocation to all jobs, and the
jobs listed all provide an h_vmem condition (see later).

..the initialisation of the complex value for the node looks OK:

$ qconf -se compute-0-3 | grep complex
complex_values        h_vmem=128G

The problem seems to stem from an individual job which has managed to
commit a 200G job on a node with only 128G. These are the jobs which are
running on that node.

qstat -j 341706 | grep "hard resource_list"
hard resource_list:         h_vmem=21474836480
qstat -j 342549 | grep "hard resource_list"
hard resource_list:         h_vmem=21474836480
qstat -j 342569 | grep "hard resource_list"
hard resource_list:         h_vmem=21474836480
qstat -j 343337 | grep "hard resource_list"
hard resource_list:         h_vmem=21474836480
qstat -j 343367 | grep "hard resource_list"
hard resource_list:         h_vmem=21474836480
qstat -j 343400 | grep "hard resource_list"
hard resource_list:         h_vmem=200G

We still have jobs which are queued because there is insufficient
memory, so the limit isn't being completely ignored, but I don't
understand how the jobs which are currently running were able to be
scheduled.

(-l h_vmem=40G) cannot run at host "compute-0-3.local" because it
offers only hc:h_vmem=-172.000G

Does anyone have any suggestions for how the cluster could have got
itself into this situation?

Thanks

Simon.
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT
Registered Charity No. 1053902.
The information transmitted in this email is directed only to the
addressee. If you received this in error, please contact the sender and
delete this email from your system. The contents of this e-mail are the
views of the sender and do not necessarily represent the views of the
Babraham Institute. Full conditions at:
www.babraham.ac.uk<http://www.babraham.ac.uk/terms>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT
Registered Charity No. 1053902.
The information transmitted in this email is directed only to the
addressee. If you received this in error, please contact the sender and
delete this email from your system. The contents of this e-mail are the
views of the sender and do not necessarily represent the views of the
Babraham Institute. Full conditions at:
www.babraham.ac.uk<http://www.babraham.ac.uk/terms>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users



--
Best,

Feng

The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered 
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If you 
received this in error, please contact the sender and delete this email from your 
system. The contents of this e-mail are the views of the sender and do not 
necessarily represent the views of the Babraham Institute. Full conditions at: 
www.babraham.ac.uk<http://www.babraham.ac.uk/terms>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users


--
Alex Chekholko [email protected]

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to