In the message dated: Thu, 21 Dec 2017 23:17:52 +0100,
The pithy ruminations from Reuti on 
<Re: [gridengine users] resource types -- changing BOOL to INT but keeping qsub 
unchanged> were:
=> Hi,
=> 
=> Am 21.12.2017 um 22:46 schrieb [email protected]:
=> 

=> > 
=> > I'm considering changing "gpu" to an INT (set to the number of GPUs/node),
=> > making it a consumable resource, and updating our JSV (in perl) so that
=> > if the job is submitted as
=> > 
=> >    qsub -l gpu foobar
=> > 
=> > it will be altered to the equivalent of
=> > 
=> >    qsub -l gpu=1 foobar
=> > 
=> > to keep things easy for users.
=> > 
=> > Any suggestions about this plan?
=> 
=> Even with "-w n" you will face a "missing value for request" I fear, as it's 
AFAIK checked before the JSV will be called*. I had the idea in the past to 
change the default value for an integer request without a number to one (it's 
quiet easy to find in the source where the BOOL without a value is expanded) 
but it was denied.
=> 

Well, I tried the changes:

        qconf -sc | grep gpu
        gpu                 cuda       INT         <=    YES         JOB        
0        1000


And submitted a job:

        qsub -l gpu ./smi.qsub

And it seems to have been accepted by qsub (note the change to "gpu=1" from our 
JSV):

        qstat -j 737215|grep gpu
        hard resource_list:         gpu=1,h_vmem=4g,h_stack=256m


Perhaps the "missing value for request" check only applies to certain
SGE versions? I left out mentioning that we're running SoGE 8.1.6.


=> But: do you need to know which GPU will be used? Univa GE has a named

Yeah, that was going to be another post.

=> resource. With SGE it might help to have one queue with one slot per GPU,
=> and from the name (i.e. suffix) of the granted queue name you know which
=> GPU you have to use.

True, but even with that info, there doesn't seem to be any universal
way to tell an arbitrary GPU job which GPU to use -- they all default
to device 0.

Our likely solution will be to install 1 GPU/node, except for a few nodes
with multiple GPUs where any job requesting that node gets all GPUs,
and the job is expected to manage the multiple devices.

Thanks,

Mark

=> 
=> -- Reuti
=> 
=> *) The "-w e" check will even be performed twice: one time before the JSV 
and one time after. This is to my opinion not optimal, as it prohibits to 
submit a completely malformed request and put things in order inside the JSV. 
Sure, one problem are the fields which are feed to the JSV. How to express a 
missing integer value (besides the IEEE ways like NaN and alike).
=> 
=> 
=> > 
=> > Thanks,
=> > 
=> > Mark

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to