On Wed, Oct 25, 2017 at 04:59:05PM +0200, Reuti wrote:
> Hi,
> 
> > Am 25.10.2017 um 16:06 schrieb ANS <[email protected]>:
> > 
> > Hi all,
> > 
> > I am trying to integrate GPUs into my existing cluster with 2 GPUs per 
> > node. I have gone through few sites and done the following
> > 
> > qconf -mc
> > gpu                 gpu        INT         <=    YES         YES        0   
> >      0
> > 
> > qconf -me gpunode1
> > complex_values        gpu=2
> > 
> > But still i am unable to launch the jobs using GPUs. Can anyone help me.
> 
> What do you mean by "unable to launch the jobs using GPUs"? How do you submit 
> the jobs? The jobs are stuck or never accepted by SGE?
> 
> There is no way to determine which GPU was assigned to which job. Univa GE 
> has an extension for it called "named resources" or so. You could define two 
> queues with each having one slot and the name of the name of the chosen queue 
> determines the GPU to be used after some mangling.

We set permissions so that only owner and group,not world, can access  the
/devi/nvidia? file (/dev/nvidactl by contrast needs to be accessible by
anyone).  We use a prolog script  here which uses lock files to logically
assign nvidia GPUs to jobs. The script then chgrp's the /dev/nvidia? file
associated with the GPU to be owned by the group associated with the job.
The epilog undoes what the prolog did.  We need to pass a magic flag to
the kernel or virtually any time you touch them the permissions of the
/dev files get reset.

This seems to work to prevent programs from seeing GPUs they weren't
assigned by the prolog

We use JOB for consumable so GPUs are allocated per job on the head node
not per slot.


William

Attachment: signature.asc
Description: PGP signature

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to