Dear all
I'm facing a new problem on my cluster with SGE. I don't show this
before.. O maybe I never detect it.
I have some nodes with 2 queue, one (named "all.q" ) to run jobs no more
than 24h , and another queue (named "lenta.q" ) to run jobs than need
more than 24 h.
I determine qa resource quota as i read some time in this email list,
defined as following:
{
name slots_equals_cores
description Prevent core over-subscription across queues
enabled TRUE
limit hosts {*} to slots=$num_proc
}
For now, i have a node with 64 cores, 40 cores for the normal queue ,
and 24 for the large queue.
[email protected] BP 0/16/40 15.93 lx-amd64
[email protected] BP 0/0/24 15.93 lx-amd64
Some jobs with 2 cores don't enter in this node on the large time queue,
althougth there is no problem with memory or core. The qstat indicate me
this:
"////compute-2-0/" in rule "slots_equals_cores/1"
cannot run because it exceeds limit
"////compute-2-0/" in rule "slots_equals_cores/1"
cannot run because it exceeds limit
"////compute-0-4/" in rule "slots_equals_cores/1"
cannot run in PE "thread" because it only
offers 0 slots
I really don't understand why the job is not running on tis nodes, at
for my opinion it's free for this.
Somenoe can help me about this?
REgards.
--
-- Jérôme
Le baiser est la plus sûre façon de se taire en disant tout.
(Guy de Maupassant)
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users