Re: [gridengine users] Integration of GPUs into GE2011.11p1

Reuti Fri, 27 Oct 2017 02:45:12 -0700

> Am 27.10.2017 um 06:18 schrieb ANS <[email protected]>:
> 
> Hi,
> 
> I requesting the PE mpi as
> #$ -pe mpi 8
> in the job submission script


Aha, this will multiply the GPU request too. Do you want 8 GPUs?

-- Reuti


> Yes i have verified it no other job is running in that node.
> 
> Queue Configuration
> qname                 gpu.q
> hostlist              @gpuhosts
> seq_no                0
> load_thresholds       np_load_avg=1.75
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list             NONE
> pe_list               make orte mpi gpup
> rerun                 FALSE
> slots                 1,[gpunode1.local=12],[gpunode2.local=12]
> tmpdir                /tmp
> shell                 /bin/bash
> prolog                NONE
> epilog                NONE
> shell_start_mode      unix_behavior
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            NONE
> user_lists            NONE
> xuser_lists           NONE
> subordinate_list      NONE
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  INFINITY
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
> 
> Exec host configuration
> hostname              gpunode1
> load_scaling          NONE
> complex_values        gpu=2
> user_lists            NONE
> xuser_lists           NONE
> projects              NONE
> xprojects             NONE
> usage_scaling         NONE
> report_variables      NONE
> 
> 
> PE Configuration
> pe_name            mpi
> slots              9999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    /bin/true
> stop_proc_args     /bin/true
> allocation_rule    $fill_up
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary TRUE
> 
> Each node is having 16 cores, 2 GPUs. So i created 2 queues gpu.q with 12
> cores and serial.q with 4 cores from each server.
> 
> Kindly let me know if any further info is required.
> 
> Thanks,
> ANS
> 
> 
> On Fri, Oct 27, 2017 at 3:21 AM, Reuti <[email protected]> wrote:
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> 
>> Am 26.10.2017 um 19:05 schrieb ANS:
>> 
>>> Hi,
>>> 
>>> while creating the gpu.q i have added the hosts with gpus in the
>> hostlist.
>>> 
>>> After applying the scheduling i am getting the following error:
>>> cannot run in PE "mpi" because it only offers 0 slots
>> 
>> Are you requesting the PE "mpi"?
>> 
>> Did you check with `qstat -u "*"` whether anything else is running on the
>> nodes? If it's not the case please post the queue, exechost and PE
>> configuration.
>> 
>> - -- Reuti
>> 
>> 
>>> But in mpi PE i have made the slots as 999, is there any parameter in PE
>> to indicate the no of GPUs.
>>> 
>>> Also is there any attribute like complex in queue configuration to
>> enable the GPU?
>>> 
>>> Thanks,
>>> ANS
>>> 
>>> On Thu, Oct 26, 2017 at 2:51 PM, Reuti <[email protected]>
>> wrote:
>>> 
>>>> Am 26.10.2017 um 07:00 schrieb ANS <[email protected]>:
>>>> 
>>>> Hi,
>>>> 
>>>> Thank you for the reply.
>>>> 
>>>> I am submitting the job using the job submission script which is
>> working fine in CPU by adding -l gpu=2 and changing the queue to gpu.q.
>>> 
>>> I don't know your script and what additional resource request it has.
>> Maybe there are contradictory requests which can't be fulfilled.
>>> 
>>> 
>>>> So after launching the jobs the jobs are staying in qw state only.
>>> 
>>> The machines with the GPU are attached to the "hostlist" of "gpu.q"?
>>> 
>>> You can get some info after setting:
>>> 
>>> $ qconf -msconf
>>> …
>>> schedd_job_info                   true
>>> 
>>> and issuing:
>>> 
>>> $ qstat -j <job_id>
>>> …
>>> scheduling info:
>>> 
>>> 
>>>> I am not restricting the jobs to run on particular GPU but they can
>> run on any gpu.
>>> 
>>> Yes, but which job is using which GPU? It might be necessary to address
>> a certain one in your job script, but this depends on your application.
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> Thanks,
>>>> ANS
>>>> 
>>>> 
>>>> 
>>>> On Wed, Oct 25, 2017 at 8:29 PM, Reuti <[email protected]>
>> wrote:
>>>> Hi,
>>>> 
>>>>> Am 25.10.2017 um 16:06 schrieb ANS <[email protected]>:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I am trying to integrate GPUs into my existing cluster with 2 GPUs
>> per node. I have gone through few sites and done the following
>>>>> 
>>>>> qconf -mc
>>>>> gpu                 gpu        INT         <=    YES         YES
>>    0        0
>>>>> 
>>>>> qconf -me gpunode1
>>>>> complex_values        gpu=2
>>>>> 
>>>>> But still i am unable to launch the jobs using GPUs. Can anyone help
>> me.
>>>> 
>>>> What do you mean by "unable to launch the jobs using GPUs"? How do you
>> submit the jobs? The jobs are stuck or never accepted by SGE?
>>>> 
>>>> There is no way to determine which GPU was assigned to which job.
>> Univa GE has an extension for it called "named resources" or so. You could
>> define two queues with each having one slot and the name of the name of the
>> chosen queue determines the GPU to be used after some mangling.
>>>> 
>>>> ===
>>>> 
>>>> Note that GE2011.11p1 was never updated and
>> https://arc.liv.ac.uk/trac/SGE might be more recent.
>>>> 
>>>> -- Reuti
>>>> 
>>>> 
>>>> 
>>> M: +91 9676067674
>> 
>> -----BEGIN PGP SIGNATURE-----
>> Comment: GPGTools - https://gpgtools.org
>> 
>> iEYEARECAAYFAlnyWMsACgkQo/GbGkBRnRpMIACguXBIQOOwfGwWOGCSOHQqtB8J
>> dCcAoIqfnLydNMuVLifLGfaVNuSd3GZi
>> =q+ke
>> -----END PGP SIGNATURE-----
>> 
> 
> 
> 
> -- 
> 
> M: +91 9676067674


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Integration of GPUs into GE2011.11p1

Reply via email to