In the message dated: Thu, 21 Dec 2017 23:17:52 +0100,
The pithy ruminations from Reuti on
<Re: [gridengine users] resource types -- changing BOOL to INT but keeping qsub
unchanged> were:
=> Hi,
=>
=> Am 21.12.2017 um 22:46 schrieb [email protected]:
=>
=> >
=> > I'm considering changing "gpu" to an INT (set to the number of GPUs/node),
=> > making it a consumable resource, and updating our JSV (in perl) so that
=> > if the job is submitted as
=> >
=> > qsub -l gpu foobar
=> >
=> > it will be altered to the equivalent of
=> >
=> > qsub -l gpu=1 foobar
=> >
=> > to keep things easy for users.
=> >
=> > Any suggestions about this plan?
=>
=> Even with "-w n" you will face a "missing value for request" I fear, as it's
AFAIK checked before the JSV will be called*. I had the idea in the past to
change the default value for an integer request without a number to one (it's
quiet easy to find in the source where the BOOL without a value is expanded)
but it was denied.
=>
Well, I tried the changes:
qconf -sc | grep gpu
gpu cuda INT <= YES JOB
0 1000
And submitted a job:
qsub -l gpu ./smi.qsub
And it seems to have been accepted by qsub (note the change to "gpu=1" from our
JSV):
qstat -j 737215|grep gpu
hard resource_list: gpu=1,h_vmem=4g,h_stack=256m
Perhaps the "missing value for request" check only applies to certain
SGE versions? I left out mentioning that we're running SoGE 8.1.6.
=> But: do you need to know which GPU will be used? Univa GE has a named
Yeah, that was going to be another post.
=> resource. With SGE it might help to have one queue with one slot per GPU,
=> and from the name (i.e. suffix) of the granted queue name you know which
=> GPU you have to use.
True, but even with that info, there doesn't seem to be any universal
way to tell an arbitrary GPU job which GPU to use -- they all default
to device 0.
Our likely solution will be to install 1 GPU/node, except for a few nodes
with multiple GPUs where any job requesting that node gets all GPUs,
and the job is expected to manage the multiple devices.
Thanks,
Mark
=>
=> -- Reuti
=>
=> *) The "-w e" check will even be performed twice: one time before the JSV
and one time after. This is to my opinion not optimal, as it prohibits to
submit a completely malformed request and put things in order inside the JSV.
Sure, one problem are the fields which are feed to the JSV. How to express a
missing integer value (besides the IEEE ways like NaN and alike).
=>
=>
=> >
=> > Thanks,
=> >
=> > Mark
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users