"-l gpu=1" requests 1 gpu for each process/slot for parallel jobs. If you request "-pe mpi 8" in your job script at the same time, you are actually also requesting 8 gpus for your job.
Maybe you can try to set "gpu" to be a boolean, not int, and set it to be true by default, and set it to be false in .ge_rerquest file. For gpu jobs, request it to be true. On Sat, Dec 5, 2015 at 11:44 AM, Reuti <[email protected]> wrote: > > Am 05.12.2015 um 17:12 schrieb Dj Merrill: > >> On 12/5/2015 11:02 AM, Reuti wrote: >>> Is gpu set to FORCED on your case? >> >> No. >> >> The one major difference between my setup and Rajil is that I just have one >> single all.q queue defined, with only a few of the hosts having GPUs (and >> the complex variable gpu=1 set for those hosts). >> >> One way to do it would be to use a TRUE/FALSE for the gpu value but doing it >> that way won't ensure that only one job has access to the gpu at the same >> time. In other words, if one job only used 8 cpu slots and needed the gpu, >> nothing would prevent another job from also running on the same host and >> also trying to the gpu. >> >> However, assigning gpu as a number (ie, gpu=1 for the one card in the host) >> also seems to imply that the gpu can only be used by one cpu slot. If you >> request multiple cpu slots, it seems to only allow the gpu to be assigned to >> one of those slots, and can't tell that the other 7 cpu slots may also >> belong to the same job, if requesting pe_lots=8 and gpu=1 for example. This >> makes sense in a way, but ultimately what I am hoping to figure out is how >> to assign a single gpu to an entire multicpu job per machine, not just to a >> single cpu slot per machine. > > Yes, it will be multiplied too. But how does the job behave in case you want > to run 8 processes with an MPI job, but only one GPU is installed. Will only > one process/thread access it in a serial step? Doesn't it imply it to have a > number of GPUs installed according to the number of slots you request? > > Define the gpu complex as consumable JOB, then it won't be multiplied. > > -- Reuti > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users -- Best, Feng _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
