Because I am testing with qsub -w v, the jobs is not accepted for scheduling, job id is not generated, and qstat -j will not work. The output of qsub is as I showed in the original email:

Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu001" because job requests unknown resource (mem_free) Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu002" because job requests unknown resource (mem_free) Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu003" because job requests unknown resource (mem_free) Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu004" because job requests unknown resource (mem_free) Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu005" because job requests unknown resource (mem_free) Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu006" because job requests unknown resource (mem_free)
...

Ilya.


-------- Original Message --------
Subject: Re: [gridengine users] Cannot request resource if it is a load value of memory type: SGE reports it as unknown resource
From: Feng Zhang <[email protected]>
To: Ilya M <[email protected]>
Date: 1/23/15, 9:27 AM
Llya,

Can you please run:

qstat -j <jobid>

and past the output here? It may be useful for checking the problem

On Fri, Jan 23, 2015 at 12:08 PM, Ilya M <[email protected]> wrote:
Removed the quota limits. To no avail: same problems.


-------- Original Message --------
Subject: Re: [gridengine users] Cannot request resource if it is a load
value of memory type: SGE reports it as unknown resource
From: Reuti <[email protected]>
To: Ilya M <[email protected]>
Date: 1/23/15, 2:33 AM
Can you remove them temporarily? I saw cases where suddenly the "unknown
resource" popped up - and also suddenly vanished again, but it was somehow
connected to RQS was my conclusion.

-- Reuti


Am 23.01.2015 um 00:16 schrieb Ilya M <[email protected]>:

There are two RQS, one is disabled:

{
    name         limit_for_interns
    description  "limit to max 5 GPU jobs per intern."
    enabled      TRUE
    limit        users {int1,int2} hosts @gpu to slots=5
}
{
    name         limit_slots
    description  NONE
    enabled      FALSE
    limit        hosts {@gpu} to slots=2
}


-------- Original Message --------
Subject: Re: [gridengine users] Cannot request resource if it is a load
value of memory type: SGE reports it as unknown resource
From: Reuti <[email protected]>
To: Ilya <[email protected]>
Date: 1/21/15, 16:12
Hi,

Am 22.01.2015 um 00:52 schrieb Ilya:

Something happened to the SGE (6.2u5) that had been running fine for
many months, and users can no longer put resource requests for load values
if they are of memory type, e.g.

qsub -l mem_free=5G -w v .... produces the following output:

cannot run in queue "gpu.q@gpu038" because job requests unknown
resource (mem_free)

The resource is available, though, when querying for it:
qhost -F mem_free -h gpu038
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE SWAPTO
SWAPUS

-------------------------------------------------------------------------------
global                  -               -     -       - -       -
-
gpu038                         lx24-amd64     16  2.11  126.1G 15.7G
4.0G     0.0
     Host Resource(s):      hl:mem_free=110.416G


This was first reported by a user when he tried to request custom "hl"
resource. However, it now appears that all "hl" resources of type "memory"
show this behavior. Integer "hl" are OK.
Do you have any RQS in place?

-- Reuti


I bounced qmaster between master and shadow-master a couple of times,
but it did not resolve the problem.

Additionally, when I added MONITOR=1 to scheduler's configuration, the
file $SGE_ROOT/$SGE_CELL/common/schedule contains only colons:
::::::::
::::::::
::::::::

Any ideas?

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users



_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to