Hi All,
Something happened to the SGE (6.2u5) that had been running fine for
many months, and users can no longer put resource requests for load
values if they are of memory type, e.g.
qsub -l mem_free=5G -w v .... produces the following output:
cannot run in queue "gpu.q@gpu038" because job requests unknown resource
(mem_free)
The resource is available, though, when querying for it:
qhost -F mem_free -h gpu038
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO
SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
gpu038 lx24-amd64 16 2.11 126.1G 15.7G
4.0G 0.0
Host Resource(s): hl:mem_free=110.416G
This was first reported by a user when he tried to request custom "hl"
resource. However, it now appears that all "hl" resources of type
"memory" show this behavior. Integer "hl" are OK.
I bounced qmaster between master and shadow-master a couple of times,
but it did not resolve the problem.
Additionally, when I added MONITOR=1 to scheduler's configuration, the
file $SGE_ROOT/$SGE_CELL/common/schedule contains only colons:
::::::::
::::::::
::::::::
Any ideas?
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users