On 5 December 2015 at 11:00, Feng Zhang <[email protected]> wrote:
> "-l gpu=1" requests 1 gpu for each process/slot for parallel jobs. If
> you request "-pe mpi 8" in your job script at the same time, you are
> actually  also requesting 8 gpus for your job.
>
> Maybe you can try to set "gpu" to be a boolean, not int, and set it to
> be true by default, and set it to be false in .ge_rerquest file. For
> gpu jobs, request it to be true.
>

Thanks i tried this. With this i am able to submit 8CPU +1 GPU job.
The first two jobs go to each node and run fine. But the third job
should go to waiting but instead it also starts to run even though no
more gpu is available. Is there a way to make the third job to wait
when no more GPU's are available?

#qstat
419 0.50500 j1 rajil r 12/05/2015 11:55:32 [email protected] 8
420 0.50500 j2 rajil r 12/05/2015 11:55:47 [email protected] 8
421 0.50500 j3 rajil r 12/05/2015 11:56:02 [email protected] 8

I changed the complex config as
#qconf -mc
gpu gpu BOOL == YES NO 0 0

and queue as
#qconf -sq gpu.q
qname gpu.q
hostlist @gpuhosts
slots 1,[compute-4-0.local=32],[compute-4-1.local=32]
complex_values gpu=TRUE

and exec host as
#qconf -se compute-4-0
complex_values gpu=TRUE

and set exclusive mode for gpu on compute-4-0 and compute-4-1
#nvidia-smi -c 1

The job submission script is like this

#!/bin/csh
#$ -V #$ -S /bin/csh
#$ -N j3 #$ -q gpu.q
#$ -l gpu=TRUE
#$ -m beas
#$ -j y -o /home/rajil/tmp/tst/j3.qlog
#$ -pe mpi 8
abaqus python /share/apps/abaqus/6.14-2/../abaJobHandler.py j3
/home/rajil/tmp/tst j3.fs.133367 0 j3.com model.inp

-Rajil
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to