In the message dated: Wed, 14 Aug 2019 10:21:12 -0400,
The pithy ruminations from Dj Merrill on
[[gridengine users] Multi-GPU setup] were:
=> To date in our HPC Grid running Son of Grid Engine 8.1.9, we've had
=> single Nvidia GPU cards per compute node. We are contemplating the
=> purchase of a single compute node that has multiple GPU cards in it, and
=> want to ensure that running jobs only have access to the GPU resources
=> they ask for, and don't take over all of the GPU cards in the system.
That's an issue.
=>
=> We define gpu as a resource:
=> qconf -sc:
=> #name shortcut type relop requestable consumable
=> default urgency
=> gpu gpu INT <= YES YES 0
=> 0
=>
=> We define GPU persistence mode and exclusive process on each node:
=> nvidia-smi -pm 1
=> nvidia-smi -c 3
Good.
=>
=> We set the number of GPUs in the host definition:
=> qconf -me (hostname)
=>
=> complex_values gpu=1 for our existing nodes, and this setup has been
=> working fine for us.
Good.
=>
=> With the new system, we would set:
=> complex_values gpu=4
Yes.
=>
=>
=> If a job is submitted asking for one GPU, will it be limited to only
=> having access to a single GPU card on the system, or can it detect the
=> other cards and take up all four (and if so how do we prevent that)?
There are two issues you'll need to deal with:
1. Preventing a job from using more than the requested number of GPUs
I don't have a great answer for that. As you see, SGE is good at
keeping track of the number
of instances of a resource (the count), but not which physical GPU is
assigned to a job.
For a cgroups-like solution, see:
http://gridengine.org/pipermail/users/2014-November/008128.html
http://gridengine.org/pipermail/users/2017-October/009952.html
http://gridengine.org/pipermail/users/2017-February/009581.html
I don't have experience with the described method, but the trick (using
a job prolog to chgrp the /dev/nvidia${GPUNUM} device) is on my list
of things-to-do.
2. Ensuring that a job tries to use a free GPU, not just _any_ GPU
Since SGE doesn't explicitely tell the job which GPU to use,
we've found that a lot of software blindly tries to use GPU
#0, apparently assuming that the software is running on a
single-user/single-GPU system (python, I'm looking at you). Our
solution has been to "suggest" that users run a command in their
submit script to report to the number (GPU ID) of the next free
GPU. This has eliminated most instances of this issue, but there
are still some race conditions.
#################################################################
#! /bin/bash
# Script to return GPU Id of idle GPUs, if any
#
# Used in a submit script, in the form
#
# CUDA_VISIBLE_DEVICES=`get_CUDA_VISIBLE_DEVICES` || exit
# export CUDA_VISIBLE_DEVICES
# myGPUjob
# Some software takes the specification of the GPU device on the command line.
In that case, the command line might be something like:
#
# myGPUjob options -dev cuda${CUDA_VISIBLE_DEVICES}
#
# The command:
# nvidia-smi pmon
# returns output in the form:
#################
# # gpu pid type sm mem enc dec command
# # Idx # C/G % % % % name
# 0 - - - - - - -
#################
# Note the absence (-) of a PID to indicate an idle GPU
which nvidia-smi 1> /dev/null 2>&1
if [ $? != 0 ] ; then
# no nvidia-smi found!
echo "-1"
echo "No 'nvidia-smi' utility found on node `hostname -s` at `date`."
1>&2
if [ "X$JOB_ID" != "X" ] ; then
# running as a batch job, this shouldn't happen
( printf "SGE job ${JOB_ID}: No 'nvidia-smi' utility found on
node `hostname -s` at `date`.\n") 1>&2 | Mail -s "unexpected: no nvidia-smi
utility on `hostname -s`" root
fi
exit 1
fi
numGPUs=`nvidia-smi pmon -c 1 | wc -l` ; numGPUs=$((numGPUs -2)) #
subtract the headers
free=`nvidia-smi pmon -c 1 | awk '{if ( $2 == "-" ) {print $1 ; exit}}'`
if [[ "X$free" != "X" && $numGPUs -gt 1 ]] ; then
# we may have a race condition, where 2 (or more) GPU jobs are probing
nvidia-smi at once, and each is reporting that there is a free GPU
# are available. Sleep a random amount of time and check again....this
is not guanteed to avoid the conflict, but it
# will help...
sleep $((RANDOM % 11))
free=`nvidia-smi pmon -c 1 | awk '{if ( $2 == "-" ) {print $1 ; exit}}'`
fi
if [ "X$free" = "X" ] ; then
echo "-1"
echo "SGE job ${JOB_ID} (${JOB_NAME}) failed: no free GPU on node
`hostname -s` at `date`." 1>&2
( printf "SGE job ${JOB_ID}, job name ${JOB_NAME} from $USER\nNo free
GPU on node `hostname -s` at `date`.\n\nGPU
status:\n==================================\n" ; nvidia=smi ; printf
"============================\n\nSGE status on this
node:\n=======================================\n" ; qstat -u \* -s rs -l
hostname=`hostname` ) 2>&1 | Mail -s "unexpected: no free GPUs on `hostname
-s`" root
exit 1
fi
echo $free
exit 0
#################################################################
Mark
=>
=> Is there something like "cgroups" for gpus?
=>
=> Thanks,
=>
=> -Dj
=>
=>
=> _______________________________________________
=> users mailing list
=> [email protected]
=> https://gridengine.org/mailman/listinfo/users
=>
--
Mark Bergman Biker, Rock Climber, Unix mechanic, IATSE #1 Stagehand
http://wwwkeys.pgp.net:11371/pks/lookup?op=get&search=bergman%40merctech.com
I want a newsgroup with a infinite S/N ratio! Now taking CFV on:
rec.motorcycles.stagehands.pet-bird-owners.pinballers.unix-supporters
15+ So Far--Want to join? Check out: http://www.panix.com/~bergman
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users