Re: [gridengine users] Multi-GPU setup

bergman Wed, 14 Aug 2019 09:29:57 -0700

In the message dated: Wed, 14 Aug 2019 10:21:12 -0400,
The pithy ruminations from Dj Merrill on 
[[gridengine users] Multi-GPU setup] were:
=> To date in our HPC Grid running Son of Grid Engine 8.1.9, we've had
=> single Nvidia GPU cards per compute node.  We are contemplating the
=> purchase of a single compute node that has multiple GPU cards in it, and
=> want to ensure that running jobs only have access to the GPU resources
=> they ask for, and don't take over all of the GPU cards in the system.


That's an issue.

=> 
=> We define gpu as a resource:
=> qconf -sc:
=> #name               shortcut   type      relop   requestable consumable
=> default  urgency
=> gpu                 gpu        INT       <=      YES         YES    0
=>     0
=> 
=> We define GPU persistence mode and exclusive process on each node:
=> nvidia-smi -pm 1
=> nvidia-smi -c 3

Good.

=> 
=> We set the number of GPUs in the host definition:
=> qconf -me (hostname)
=> 
=> complex_values   gpu=1   for our existing nodes, and this setup has been
=> working fine for us.

Good.

=> 
=> With the new system, we would set:
=> complex_values   gpu=4

Yes.

=> 
=> 
=> If a job is submitted asking for one GPU, will it be limited to only
=> having access to a single GPU card on the system, or can it detect the
=> other cards and take up all four (and if so how do we prevent that)?

There are two issues you'll need to deal with:

1. Preventing a job from using more than the requested number of GPUs
        I don't have a great answer for that. As you see, SGE is good at 
keeping track of the number
        of instances of a resource (the count), but not which physical GPU is 
assigned to a job.

        For a cgroups-like solution, see:
                http://gridengine.org/pipermail/users/2014-November/008128.html
                http://gridengine.org/pipermail/users/2017-October/009952.html
                http://gridengine.org/pipermail/users/2017-February/009581.html


        I don't have experience with the described method, but the trick (using
        a job prolog to chgrp the /dev/nvidia${GPUNUM} device) is on my list
        of things-to-do.

2. Ensuring that a job tries to use a free GPU, not just _any_ GPU
        Since SGE doesn't explicitely tell the job which GPU to use,
        we've found that a lot of software blindly tries to use GPU
        #0, apparently assuming that the software is running on a
        single-user/single-GPU system (python, I'm looking at you). Our
        solution has been to "suggest" that users run a command in their
        submit script to report to the number (GPU ID) of the next free
        GPU. This has eliminated most instances of this issue, but there
        are still some race conditions.

#################################################################
#! /bin/bash

# Script to return GPU Id of idle GPUs, if any
#
# Used in a submit script, in the form
#
#       CUDA_VISIBLE_DEVICES=`get_CUDA_VISIBLE_DEVICES` || exit
#       export CUDA_VISIBLE_DEVICES
#       myGPUjob 
# Some software takes the specification of the GPU device on the command line. 
In that case, the command line might be something like:
#
#       myGPUjob options -dev cuda${CUDA_VISIBLE_DEVICES}
#


# The command:
#        nvidia-smi pmon
# returns output in the form:
#################
#       # gpu        pid  type    sm   mem   enc   dec   command
#       # Idx          #   C/G     %     %     %     %   name
#           0          -     -     -     -     -     -   -    
#################
# Note the absence (-) of a PID to indicate an idle GPU

which nvidia-smi 1> /dev/null 2>&1
if [ $? != 0  ] ; then
        # no nvidia-smi found!
        echo "-1"
        echo "No 'nvidia-smi' utility found on node `hostname -s` at `date`." 
1>&2
        if [ "X$JOB_ID" != "X" ] ; then
                # running as a batch job, this shouldn't happen
                ( printf "SGE job ${JOB_ID}: No 'nvidia-smi' utility found on 
node `hostname -s` at `date`.\n") 1>&2 | Mail -s "unexpected: no nvidia-smi 
utility on `hostname -s`" root
        fi
        exit 1
fi

numGPUs=`nvidia-smi pmon -c 1 | wc -l` ; numGPUs=$((numGPUs -2))        # 
subtract the headers
free=`nvidia-smi pmon -c 1 | awk '{if ( $2 == "-" ) {print $1 ; exit}}'`

if [[ "X$free" != "X" && $numGPUs -gt 1 ]] ; then
        # we may have a race condition, where 2 (or more) GPU jobs are probing 
nvidia-smi at once, and each is reporting that there is a free GPU
        # are available. Sleep a random amount of time and check again....this 
is not guanteed to avoid the conflict, but it 
        # will help...
        sleep $((RANDOM % 11))
        free=`nvidia-smi pmon -c 1 | awk '{if ( $2 == "-" ) {print $1 ; exit}}'`
fi
        
if [ "X$free" = "X" ] ; then
        echo "-1"
        echo "SGE job ${JOB_ID} (${JOB_NAME}) failed: no free GPU on node 
`hostname -s` at `date`." 1>&2
        ( printf "SGE job ${JOB_ID}, job name ${JOB_NAME} from $USER\nNo free 
GPU on node `hostname -s` at `date`.\n\nGPU 
status:\n==================================\n" ; nvidia=smi ; printf 
"============================\n\nSGE status on this 
node:\n=======================================\n" ; qstat -u \* -s rs -l 
hostname=`hostname` ) 2>&1 | Mail -s "unexpected: no free GPUs on `hostname 
-s`" root
        exit 1
fi

echo $free
exit 0
#################################################################



Mark


=> 
=> Is there something like "cgroups" for gpus?
=> 
=> Thanks,
=> 
=> -Dj
=> 
=> 
=> _______________________________________________
=> users mailing list
=> [email protected]
=> https://gridengine.org/mailman/listinfo/users
=> 
-- 
Mark Bergman    Biker, Rock Climber, Unix mechanic, IATSE #1 Stagehand

http://wwwkeys.pgp.net:11371/pks/lookup?op=get&search=bergman%40merctech.com

I want a newsgroup with a infinite S/N ratio! Now taking CFV on:
rec.motorcycles.stagehands.pet-bird-owners.pinballers.unix-supporters
15+ So Far--Want to join? Check out: http://www.panix.com/~bergman 

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Multi-GPU setup

Reply via email to