Removed the quota limits. To no avail: same problems.
-------- Original Message --------
Subject: Re: [gridengine users] Cannot request resource if it is a load
value of memory type: SGE reports it as unknown resource
From: Reuti <[email protected]>
To: Ilya M <[email protected]>
Date: 1/23/15, 2:33 AM
Can you remove them temporarily? I saw cases where suddenly the "unknown
resource" popped up - and also suddenly vanished again, but it was somehow connected
to RQS was my conclusion.
-- Reuti
Am 23.01.2015 um 00:16 schrieb Ilya M <[email protected]>:
There are two RQS, one is disabled:
{
name limit_for_interns
description "limit to max 5 GPU jobs per intern."
enabled TRUE
limit users {int1,int2} hosts @gpu to slots=5
}
{
name limit_slots
description NONE
enabled FALSE
limit hosts {@gpu} to slots=2
}
-------- Original Message --------
Subject: Re: [gridengine users] Cannot request resource if it is a load value
of memory type: SGE reports it as unknown resource
From: Reuti <[email protected]>
To: Ilya <[email protected]>
Date: 1/21/15, 16:12
Hi,
Am 22.01.2015 um 00:52 schrieb Ilya:
Something happened to the SGE (6.2u5) that had been running fine for many
months, and users can no longer put resource requests for load values if they
are of memory type, e.g.
qsub -l mem_free=5G -w v .... produces the following output:
cannot run in queue "gpu.q@gpu038" because job requests unknown resource
(mem_free)
The resource is available, though, when querying for it:
qhost -F mem_free -h gpu038
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
gpu038 lx24-amd64 16 2.11 126.1G 15.7G 4.0G
0.0
Host Resource(s): hl:mem_free=110.416G
This was first reported by a user when he tried to request custom "hl" resource. However, it now appears that
all "hl" resources of type "memory" show this behavior. Integer "hl" are OK.
Do you have any RQS in place?
-- Reuti
I bounced qmaster between master and shadow-master a couple of times, but it
did not resolve the problem.
Additionally, when I added MONITOR=1 to scheduler's configuration, the file
$SGE_ROOT/$SGE_CELL/common/schedule contains only colons:
::::::::
::::::::
::::::::
Any ideas?
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users