Hi All,

Something happened to the SGE (6.2u5) that had been running fine for many months, and users can no longer put resource requests for load values if they are of memory type, e.g.

qsub -l mem_free=5G -w v .... produces the following output:

cannot run in queue "gpu.q@gpu038" because job requests unknown resource (mem_free)

The resource is available, though, when querying for it:
qhost -F mem_free -h gpu038
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       - -       -       -
gpu038 lx24-amd64 16 2.11 126.1G 15.7G 4.0G 0.0
    Host Resource(s):      hl:mem_free=110.416G


This was first reported by a user when he tried to request custom "hl" resource. However, it now appears that all "hl" resources of type "memory" show this behavior. Integer "hl" are OK.

I bounced qmaster between master and shadow-master a couple of times, but it did not resolve the problem.

Additionally, when I added MONITOR=1 to scheduler's configuration, the file $SGE_ROOT/$SGE_CELL/common/schedule contains only colons:
::::::::
::::::::
::::::::

Any ideas?

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to