Re: [gridengine users] RoundRobin scheduling among users

Reuti Wed, 27 Jan 2016 03:29:16 -0800

Hi,

> Am 26.01.2016 um 19:30 schrieb Dan Hyatt <dhy...@dsgmail.wustl.edu>:
> 
>  I am looking to use this differently.
> The problem I am having is that I have users with 200-1000 jobs. I have 80 
> servers with almost 1000 cores.
> For my normal queue,  I want SGE PE to create up to 4 jobs per server until 
> it runs out of servers, then add up to 4 more until all the jobs are 
> allocated.  (1 per is fine as long as it will round robin and start adding a 
> second job per server, then a third until it runs out of jobs)
> 
> Does the allocation rule limit the number of jobs per server PER qsub, or 
> total jobs allowed per server?


Not per se. But having a fixed allocation rule of 16 on a machine with 16 cores 
has this effect of course to get only one job there. Or two jobs with a fixed 
allocation rule of 8 on this machine.


> The problem I am having is that I get 20 jobs per server and overload a 
> couple of servers

Why? The jobs request a PE with the proper amount of cores? The job (i.e. the 
final application) is able to honor the granted list of machines where it 
should start slaves?


> while 80 servers running idle. Each has 10 cores and 128 GB of RAM so they 
> can handle up to 20 light jobs each.

What do you refer by "light" jobs? If you overload a machine it might double 
the execution time per (serial) job. In my opinion overloading a machine (and 
having an alarm_threshold > 1) was/is used in case a parallel job is badly 
parallelized and would leave cores often idle. It could even be adjusted so 
that the parallel job has a priority (i.e. nice value) in the queue definition 
of 0, while the serial jobs which should use the unused idling cores only get a 
priority of 19.


> Also, for the heavy CPU jobs, I want a max of 4 jobs per server, so for 
> pe_slots would I just put the integer 4 in there?

No, an "allocation_rule 4" would mean that each job may get 4 cores on this 
machine. Note that it will only start jobs in case they are dividable by 4, 
i.e. a job requesting 13 would never run if it requests this particular PE.

Unfortunately there is no default complex which could be limited to 4 per 
machine by an RQS, but you can set up a consumable complex with a default value 
of 1 and the attribute consumable "JOB". This can then be assigned and limited 
on a exechost level to 4 (this works, unless the user foul the system and 
request 0 for this complex, but a JSV could handle it). The complex could also 
be assigned with an arbitrary high value on a cluster level and an RQS could 
limit it on certain machines.

-- Reuti


> Should I create a third PE, lets say "dan" with the desired settings?  When I 
> tried this before it would throw errors.
> 
> 
> Am I correct that I want to change these settings, but I suspect I really 
> want to make a custom PE, these are default.
> 
> I was looking at http://linux.die.net/man/5/sge_pe  and 
> http://www.softpanorama.org/HPC/Grid_engine/parallel_environment.shtml but 
> seems to assume I comprehend the details of each.. Such as...can I only put 
> one setting for allocation rule per PE and one PE per queue?
> 
> 
> [root@blade5-1-1 ~]# qconf -sp make
> pe_name            make
> slots              999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    NONE
> stop_proc_args     NONE
> allocation_rule    $round_robin
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary TRUE
> qsort_args         NONE
> 
> [root@blade5-1-1 ~]# qconf -sp smp
> pe_name            smp
> slots              999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    NONE
> stop_proc_args     NONE
> allocation_rule    $pe_slots
> control_slaves     TRUE
> job_is_first_task  TRUE
> urgency_slots      min
> accounting_summary TRUE
> qsort_args         NONE
> [root@blade5-1-1 ~]# echo $pe_slots
> 
> 
> 
> [root@blade5-1-1 ~]# qconf -sp DAN
> pe_name           DAN
> slots              999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    NONE
> stop_proc_args     NONE
> allocation_rule    $round_robin
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary TRUE
> qsort_args         NONE
> 
> [root@blade5-1-1 ~]# qconf -sp smp
> pe_name            smp
> slots              999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    NONE
> stop_proc_args     NONE
> allocation_rule    4
> control_slaves     TRUE
> job_is_first_task  TRUE
> urgency_slots      min
> accounting_summary TRUE
> qsort_args         NONE
> [root@blade5-1-1 ~]# echo $pe_slots
> 
>>>> Yep, we use functional tickets to accomplish this exact goal. Every user
>>>> gets 1000 functional tickets via auto_user_fshare in sge_conf(5), though
>>>> your exact number will depend on the number tickets and weights you have
>>>> elsewhere in your policy configuration.
>>> Also the waiting time should be set to 0, and less importance of the 
>>> urgency (as the default is to give 1000 per slot in the complex 
>>> configuration - this means more slots results in being more important):
>>> 
>>> weight_user                       0.900000
>>> weight_project                    0.000000
>>> weight_department                 0.000000
>>> weight_job                        0.100000
>>> weight_tickets_functional         1000000
>>> weight_tickets_share              0
>>> share_override_tickets            TRUE
>>> share_functional_shares           TRUE
>>> max_functional_jobs_to_schedule   200
>>> report_pjob_tickets               TRUE
>>> max_pending_tasks_per_job         50
>>> halflife_decay_list               none
>>> policy_hierarchy                  F
>>> weight_ticket                     1.000000
>>> weight_waiting_time               0.000000
>>> weight_deadline                   3600000.000000
>>> weight_urgency                    0.100000
>>> weight_priority                   1.000000
>>> max_reservation                   32
>>> default_duration                  8760:00:00
>> We actually do weight waiting time, but at half the value of both
>> functional and urgency tickets. We then give big urgency boosts to
>> difficult-to-schedule jobs (i.e. lots of memory or CPUs in one spot). It
>> took us a while to arrive at a decent mix of short-run / small jobs vs
>> long-run / big jobs, and it definitely will be a site-dependent decision.
>> 
> 
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] RoundRobin scheduling among users

Reply via email to