Am 23.01.2015 um 18:08 schrieb Ilya M <[email protected]>: > > Removed the quota limits. To no avail: same problems.
Ok, but it was worth to try. -- Reuti > -------- Original Message -------- > Subject: Re: [gridengine users] Cannot request resource if it is a load value > of memory type: SGE reports it as unknown resource > From: Reuti <[email protected]> > To: Ilya M <[email protected]> > Date: 1/23/15, 2:33 AM >> Can you remove them temporarily? I saw cases where suddenly the "unknown >> resource" popped up - and also suddenly vanished again, but it was somehow >> connected to RQS was my conclusion. >> >> -- Reuti >> >> >>> Am 23.01.2015 um 00:16 schrieb Ilya M <[email protected]>: >>> >>> There are two RQS, one is disabled: >>> >>> { >>> name limit_for_interns >>> description "limit to max 5 GPU jobs per intern." >>> enabled TRUE >>> limit users {int1,int2} hosts @gpu to slots=5 >>> } >>> { >>> name limit_slots >>> description NONE >>> enabled FALSE >>> limit hosts {@gpu} to slots=2 >>> } >>> >>> >>> -------- Original Message -------- >>> Subject: Re: [gridengine users] Cannot request resource if it is a load >>> value of memory type: SGE reports it as unknown resource >>> From: Reuti <[email protected]> >>> To: Ilya <[email protected]> >>> Date: 1/21/15, 16:12 >>>> Hi, >>>> >>>> Am 22.01.2015 um 00:52 schrieb Ilya: >>>> >>>>> Something happened to the SGE (6.2u5) that had been running fine for many >>>>> months, and users can no longer put resource requests for load values if >>>>> they are of memory type, e.g. >>>>> >>>>> qsub -l mem_free=5G -w v .... produces the following output: >>>>> >>>>> cannot run in queue "gpu.q@gpu038" because job requests unknown resource >>>>> (mem_free) >>>>> >>>>> The resource is available, though, when querying for it: >>>>> qhost -F mem_free -h gpu038 >>>>> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO >>>>> SWAPUS >>>>> ------------------------------------------------------------------------------- >>>>> global - - - - - - - >>>>> gpu038 lx24-amd64 16 2.11 126.1G 15.7G >>>>> 4.0G 0.0 >>>>> Host Resource(s): hl:mem_free=110.416G >>>>> >>>>> >>>>> This was first reported by a user when he tried to request custom "hl" >>>>> resource. However, it now appears that all "hl" resources of type >>>>> "memory" show this behavior. Integer "hl" are OK. >>>> Do you have any RQS in place? >>>> >>>> -- Reuti >>>> >>>> >>>>> I bounced qmaster between master and shadow-master a couple of times, but >>>>> it did not resolve the problem. >>>>> >>>>> Additionally, when I added MONITOR=1 to scheduler's configuration, the >>>>> file $SGE_ROOT/$SGE_CELL/common/schedule contains only colons: >>>>> :::::::: >>>>> :::::::: >>>>> :::::::: >>>>> >>>>> Any ideas? >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> [email protected] >>>>> https://gridengine.org/mailman/listinfo/users >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
