It looks normal to me. Can you run:
qhost -F mem_free -l mem_free=80g to see if it can list the nodes properly? On Fri, Jan 23, 2015 at 4:35 PM, Ian Kaufman <[email protected]> wrote: > So it is requestable, but not consumable, and there is no default set > in the complex. Well, the default is set to zero, but I don't think > that is treated as a default. > > Is that what was intended - requestable but not consumable? > > Ian > > On Fri, Jan 23, 2015 at 12:36 PM, Ilya M <[email protected]> wrote: >> Natually, it does: >> >>> qconf -sc | grep mem_free >> mem_free mf MEMORY <= YES NO >> 0 0 >> >> And it is reported on all nodes: >> >>> qhost -F mem_free -h gpu001 >> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO >> SWAPUS >> ------------------------------------------------------------------------------- >> global - - - - - - - >> gpu001 lx24-amd64 16 3.32 126.1G 37.2G 4.0G >> 0.0 >> Host Resource(s): hl:mem_free=88.885G >> >> And everything was working until a week ago. >> >> Ilya. >> >> -------- Original Message -------- >> Subject: Re: [gridengine users] Cannot request resource if it is a load >> value of memory type: SGE reports it as unknown resource >> From: Ian Kaufman <[email protected]> >> To: Ilya M <[email protected]> >> Date: 1/23/15, 11:38 AM >>> >>> Is mem_free defined in the host complex_values? What does >>> >>> qconf -sc | grep mem_free >>> >>> show? Is there a default value defined? >>> >>> Ian >>> >>> On Fri, Jan 23, 2015 at 11:30 AM, Ilya M <[email protected]> wrote: >>>> >>>> Because I am testing with qsub -w v, the jobs is not accepted for >>>> scheduling, job id is not generated, and qstat -j will not work. The >>>> output >>>> of qsub is as I showed in the original email: >>>> >>>> Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu001" because >>>> job >>>> requests unknown resource (mem_free) >>>> Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu002" because >>>> job >>>> requests unknown resource (mem_free) >>>> Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu003" because >>>> job >>>> requests unknown resource (mem_free) >>>> Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu004" because >>>> job >>>> requests unknown resource (mem_free) >>>> Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu005" because >>>> job >>>> requests unknown resource (mem_free) >>>> Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu006" because >>>> job >>>> requests unknown resource (mem_free) >>>> ... >>>> >>>> Ilya. >>>> >>>> >>>> -------- Original Message -------- >>>> Subject: Re: [gridengine users] Cannot request resource if it is a load >>>> value of memory type: SGE reports it as unknown resource >>>> From: Feng Zhang <[email protected]> >>>> To: Ilya M <[email protected]> >>>> Date: 1/23/15, 9:27 AM >>>>> >>>>> Llya, >>>>> >>>>> Can you please run: >>>>> >>>>> qstat -j <jobid> >>>>> >>>>> and past the output here? It may be useful for checking the problem >>>>> >>>>> On Fri, Jan 23, 2015 at 12:08 PM, Ilya M <[email protected]> wrote: >>>>>> >>>>>> Removed the quota limits. To no avail: same problems. >>>>>> >>>>>> >>>>>> -------- Original Message -------- >>>>>> Subject: Re: [gridengine users] Cannot request resource if it is a load >>>>>> value of memory type: SGE reports it as unknown resource >>>>>> From: Reuti <[email protected]> >>>>>> To: Ilya M <[email protected]> >>>>>> Date: 1/23/15, 2:33 AM >>>>>>> >>>>>>> Can you remove them temporarily? I saw cases where suddenly the >>>>>>> "unknown >>>>>>> resource" popped up - and also suddenly vanished again, but it was >>>>>>> somehow >>>>>>> connected to RQS was my conclusion. >>>>>>> >>>>>>> -- Reuti >>>>>>> >>>>>>> >>>>>>>> Am 23.01.2015 um 00:16 schrieb Ilya M <[email protected]>: >>>>>>>> >>>>>>>> There are two RQS, one is disabled: >>>>>>>> >>>>>>>> { >>>>>>>> name limit_for_interns >>>>>>>> description "limit to max 5 GPU jobs per intern." >>>>>>>> enabled TRUE >>>>>>>> limit users {int1,int2} hosts @gpu to slots=5 >>>>>>>> } >>>>>>>> { >>>>>>>> name limit_slots >>>>>>>> description NONE >>>>>>>> enabled FALSE >>>>>>>> limit hosts {@gpu} to slots=2 >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> -------- Original Message -------- >>>>>>>> Subject: Re: [gridengine users] Cannot request resource if it is a >>>>>>>> load >>>>>>>> value of memory type: SGE reports it as unknown resource >>>>>>>> From: Reuti <[email protected]> >>>>>>>> To: Ilya <[email protected]> >>>>>>>> Date: 1/21/15, 16:12 >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Am 22.01.2015 um 00:52 schrieb Ilya: >>>>>>>>> >>>>>>>>>> Something happened to the SGE (6.2u5) that had been running fine >>>>>>>>>> for >>>>>>>>>> many months, and users can no longer put resource requests for load >>>>>>>>>> values >>>>>>>>>> if they are of memory type, e.g. >>>>>>>>>> >>>>>>>>>> qsub -l mem_free=5G -w v .... produces the following output: >>>>>>>>>> >>>>>>>>>> cannot run in queue "gpu.q@gpu038" because job requests unknown >>>>>>>>>> resource (mem_free) >>>>>>>>>> >>>>>>>>>> The resource is available, though, when querying for it: >>>>>>>>>> qhost -F mem_free -h gpu038 >>>>>>>>>> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE >>>>>>>>>> SWAPTO >>>>>>>>>> SWAPUS >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------------- >>>>>>>>>> global - - - - - - >>>>>>>>>> - >>>>>>>>>> gpu038 lx24-amd64 16 2.11 126.1G >>>>>>>>>> 15.7G >>>>>>>>>> 4.0G 0.0 >>>>>>>>>> Host Resource(s): hl:mem_free=110.416G >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This was first reported by a user when he tried to request custom >>>>>>>>>> "hl" >>>>>>>>>> resource. However, it now appears that all "hl" resources of type >>>>>>>>>> "memory" >>>>>>>>>> show this behavior. Integer "hl" are OK. >>>>>>>>> >>>>>>>>> Do you have any RQS in place? >>>>>>>>> >>>>>>>>> -- Reuti >>>>>>>>> >>>>>>>>> >>>>>>>>>> I bounced qmaster between master and shadow-master a couple of >>>>>>>>>> times, >>>>>>>>>> but it did not resolve the problem. >>>>>>>>>> >>>>>>>>>> Additionally, when I added MONITOR=1 to scheduler's configuration, >>>>>>>>>> the >>>>>>>>>> file $SGE_ROOT/$SGE_CELL/common/schedule contains only colons: >>>>>>>>>> :::::::: >>>>>>>>>> :::::::: >>>>>>>>>> :::::::: >>>>>>>>>> >>>>>>>>>> Any ideas? >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> [email protected] >>>>>>>>>> https://gridengine.org/mailman/listinfo/users >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> [email protected] >>>>>>>> https://gridengine.org/mailman/listinfo/users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> [email protected] >>>>>> https://gridengine.org/mailman/listinfo/users >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> users mailing list >>>> [email protected] >>>> https://gridengine.org/mailman/listinfo/users >>> >>> >>> >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users > > > > -- > Ian Kaufman > Research Systems Administrator > UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users -- Best, Feng _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
