> Am 27.10.2017 um 06:18 schrieb ANS <[email protected]>: > > Hi, > > I requesting the PE mpi as > #$ -pe mpi 8 > in the job submission script
Aha, this will multiply the GPU request too. Do you want 8 GPUs? -- Reuti > Yes i have verified it no other job is running in that node. > > Queue Configuration > qname gpu.q > hostlist @gpuhosts > seq_no 0 > load_thresholds np_load_avg=1.75 > suspend_thresholds NONE > nsuspend 1 > suspend_interval 00:05:00 > priority 0 > min_cpu_interval 00:05:00 > processors UNDEFINED > qtype BATCH INTERACTIVE > ckpt_list NONE > pe_list make orte mpi gpup > rerun FALSE > slots 1,[gpunode1.local=12],[gpunode2.local=12] > tmpdir /tmp > shell /bin/bash > prolog NONE > epilog NONE > shell_start_mode unix_behavior > starter_method NONE > suspend_method NONE > resume_method NONE > terminate_method NONE > notify 00:00:60 > owner_list NONE > user_lists NONE > xuser_lists NONE > subordinate_list NONE > complex_values NONE > projects NONE > xprojects NONE > calendar NONE > initial_state default > s_rt INFINITY > h_rt INFINITY > s_cpu INFINITY > h_cpu INFINITY > s_fsize INFINITY > h_fsize INFINITY > s_data INFINITY > h_data INFINITY > s_stack INFINITY > h_stack INFINITY > s_core INFINITY > h_core INFINITY > s_rss INFINITY > h_rss INFINITY > s_vmem INFINITY > h_vmem INFINITY > > Exec host configuration > hostname gpunode1 > load_scaling NONE > complex_values gpu=2 > user_lists NONE > xuser_lists NONE > projects NONE > xprojects NONE > usage_scaling NONE > report_variables NONE > > > PE Configuration > pe_name mpi > slots 9999 > user_lists NONE > xuser_lists NONE > start_proc_args /bin/true > stop_proc_args /bin/true > allocation_rule $fill_up > control_slaves TRUE > job_is_first_task FALSE > urgency_slots min > accounting_summary TRUE > > Each node is having 16 cores, 2 GPUs. So i created 2 queues gpu.q with 12 > cores and serial.q with 4 cores from each server. > > Kindly let me know if any further info is required. > > Thanks, > ANS > > > On Fri, Oct 27, 2017 at 3:21 AM, Reuti <[email protected]> wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Am 26.10.2017 um 19:05 schrieb ANS: >> >>> Hi, >>> >>> while creating the gpu.q i have added the hosts with gpus in the >> hostlist. >>> >>> After applying the scheduling i am getting the following error: >>> cannot run in PE "mpi" because it only offers 0 slots >> >> Are you requesting the PE "mpi"? >> >> Did you check with `qstat -u "*"` whether anything else is running on the >> nodes? If it's not the case please post the queue, exechost and PE >> configuration. >> >> - -- Reuti >> >> >>> But in mpi PE i have made the slots as 999, is there any parameter in PE >> to indicate the no of GPUs. >>> >>> Also is there any attribute like complex in queue configuration to >> enable the GPU? >>> >>> Thanks, >>> ANS >>> >>> On Thu, Oct 26, 2017 at 2:51 PM, Reuti <[email protected]> >> wrote: >>> >>>> Am 26.10.2017 um 07:00 schrieb ANS <[email protected]>: >>>> >>>> Hi, >>>> >>>> Thank you for the reply. >>>> >>>> I am submitting the job using the job submission script which is >> working fine in CPU by adding -l gpu=2 and changing the queue to gpu.q. >>> >>> I don't know your script and what additional resource request it has. >> Maybe there are contradictory requests which can't be fulfilled. >>> >>> >>>> So after launching the jobs the jobs are staying in qw state only. >>> >>> The machines with the GPU are attached to the "hostlist" of "gpu.q"? >>> >>> You can get some info after setting: >>> >>> $ qconf -msconf >>> … >>> schedd_job_info true >>> >>> and issuing: >>> >>> $ qstat -j <job_id> >>> … >>> scheduling info: >>> >>> >>>> I am not restricting the jobs to run on particular GPU but they can >> run on any gpu. >>> >>> Yes, but which job is using which GPU? It might be necessary to address >> a certain one in your job script, but this depends on your application. >>> >>> -- Reuti >>> >>> >>>> Thanks, >>>> ANS >>>> >>>> >>>> >>>> On Wed, Oct 25, 2017 at 8:29 PM, Reuti <[email protected]> >> wrote: >>>> Hi, >>>> >>>>> Am 25.10.2017 um 16:06 schrieb ANS <[email protected]>: >>>>> >>>>> Hi all, >>>>> >>>>> I am trying to integrate GPUs into my existing cluster with 2 GPUs >> per node. I have gone through few sites and done the following >>>>> >>>>> qconf -mc >>>>> gpu gpu INT <= YES YES >> 0 0 >>>>> >>>>> qconf -me gpunode1 >>>>> complex_values gpu=2 >>>>> >>>>> But still i am unable to launch the jobs using GPUs. Can anyone help >> me. >>>> >>>> What do you mean by "unable to launch the jobs using GPUs"? How do you >> submit the jobs? The jobs are stuck or never accepted by SGE? >>>> >>>> There is no way to determine which GPU was assigned to which job. >> Univa GE has an extension for it called "named resources" or so. You could >> define two queues with each having one slot and the name of the name of the >> chosen queue determines the GPU to be used after some mangling. >>>> >>>> === >>>> >>>> Note that GE2011.11p1 was never updated and >> https://arc.liv.ac.uk/trac/SGE might be more recent. >>>> >>>> -- Reuti >>>> >>>> >>>> >>> M: +91 9676067674 >> >> -----BEGIN PGP SIGNATURE----- >> Comment: GPGTools - https://gpgtools.org >> >> iEYEARECAAYFAlnyWMsACgkQo/GbGkBRnRpMIACguXBIQOOwfGwWOGCSOHQqtB8J >> dCcAoIqfnLydNMuVLifLGfaVNuSd3GZi >> =q+ke >> -----END PGP SIGNATURE----- >> > > > > -- > > M: +91 9676067674 _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
