On 29 May 2013 19:30, Roberto Nunnari <[email protected]> wrote:
> Kevin Buckley wrote:
>>
>> On 24 May 2013 19:52, Roberto Nunnari <[email protected]> wrote:
>>>
>>> Hi all.
>>>
>>> I just doing my first tests with Open Grid Scheduler and GPGPU.
>>>
>>
>> This is nVIDIA specific.
>>
>> One thing you might consider, although it would limit the potential
>> for multiple
>> job throughput on the GPU would be to use the nvidia-smi utility to:
>>
>> 1) give the queue config a prolog/epilog pair that moved the GPU into
>> "single user mode"
>> for the duration of a job
>>
>> 2) have a load sensor that checked for what mode the GPU was in to
>> populate the complex
>>
>>
>> In terms of multiple jobs runing simulateneously it might also be
>> possible to take the XML
>> ouput from
>>
>> nvidia-smi -q -x
>>
>> and XSLT that into something that would give you a handle on what the
>> current state
>> of the GPU is.
>>
>> For example, I can see these stanzas, from running the above against a T10
>>
>>                 <memory_usage>
>>                         <total>4095 MB</total>
>>                         <used>3 MB</used>
>>                         <free>4092 MB</free>
>>                </memory_usage>
>>                 <compute_mode>Exclusive_Thread</compute_mode>
>>                 <utilization>
>>                         <gpu_util>0 %</gpu_util>
>>                         <memory_util>0 %</memory_util>
>>                 </utilization>
>>
>>
>> The background to the above is that we have a cycle-stealing grid across
>> our
>> School's workstations and some of those have recently had GPU cards added,
>> so I've been thinking about how we might make use of them within the grid,
>> without affecting anyone sitting at the console who would be using the GPU
>> for their basic graphical output.
>>
>> Not the same use case as your "cluster" of course but thoughts I have had
>> in the general area that might get you thinking.
>
>
>
> Hi Kevin,
>
> thank you for your interesting thoughts.. but I realize that as I'm very new
> to GPUs I know very little about it.. and don't even understand why I should
> need to put the GPU in 'single user mode'..


A GPU can have more than one process using it at once. This can be useful
if you want to draw more than one thing on a display I guess!

However, in the GPU-compute arena this may not be what's required, especially
if you don't trust the codes running on it to stay within the
resources, eg memory,
that were asked for, or that the authors think they will use.

if you wish to prevent two (or more) users fighting over the resource,
or fighting
over who crashed the resource, then having  your GPU in "Exclusive Mode" would
only allow one process to access it at one one time.

Moving to the Distributed Resource Management case:

if the GPU is under the control of a scheduler, you need a way to
inform the scheduler
that the GPU is in use, otherwise, jobs requiring it could go the
resource but then be
told that they can't access the GPU and so fail.

So, if you leave your idle GPU in non-Exclusive mode and put it into
"Exclusive Mode"
as part of setting up the job, you have a simple mechanism, namely
testing for the
mode, by which to determine if it is in use or not.

As you, and others, suggest, simply using a "consumable" will do the same thing.

If however, you use the nvidia-smi utility from the start to get that
infomation dynamically,
then things will be in place, should you want to do something more
with the utility, such
as monitoring  the memory usage on the GPU, not just whether it is in
use or not.

Kevin
ECS, VUW, NZ
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to