Hi Iyad,

Reuti is correct the man sge_priority explains how sge calculates the
priority of jobs.  It includes the formula.  I will say that if you intend
to use share-tree with Array Jobs (i.e. qsub -t) then you will find out
that the priority calculation is 'wrong' because it does not properly
account for array jobs.  The functional share tree policy does not have
this issue - just the share tree policy.

Regards,

Bill.


On Wed, Feb 27, 2019 at 4:10 PM Kandalaft, Iyad (AAFC/AAC) <
iyad.kandal...@canada.ca> wrote:

> HI Reuti
>
> I'm implementing only a share-tree.  The docs somewhere state something
> along the lines of use one or the other.
> I've seen the man page as  It explains most of the math but leaves out
> some key elements.  For example, how are "tickets" handed out and in what
> quantity (i.e. why do some job get 20000 tickets based on my configuration
> below).  Also, the normalization function puts the values between 0 and 1
> but based on what?  Number of tickets issued to the job divided by the
> total?
>
> Thanks for your help.
>
> Iyad Kandalaft
>
> -----Original Message-----
> From: Reuti <re...@staff.uni-marburg.de>
> Sent: Wednesday, February 27, 2019 4:00 PM
> To: Kandalaft, Iyad (AAFC/AAC) <iyad.kandal...@canada.ca>
> Cc: users@gridengine.org
> Subject: Re: [gridengine users] Fair share policy
>
> Hi,
>
> there is a man page "man sge_priority". Which policy do you intend to use:
> share-tree (honors past usage) or functional (current use), or both?
>
> -- Reuti
>
>
> > Am 25.02.2019 um 15:03 schrieb Kandalaft, Iyad (AAFC/AAC) <
> iyad.kandal...@canada.ca>:
> >
> > Hi all,
> >
> > I recently implemented a fair share policy using share tickets.  I’ve
> been monitoring the cluster for a couple of days using qstat -pri -ext -u
> “*” in order to see how the functional tickets are working and it seems to
> have the intended effect.  There are some anomalies where some running jobs
> have 0 tickets but still get scheduled since there’s free resources; I
> assume this is normal.
> >
> > I’ll admit that I don’t fully understand the scheduling as it’s somewhat
> complex.  So, I’m hoping someone can review the configuration to see if
> they can find any glaring issues such as conflicting options.
> >
> > I created a share-tree and gave all users an equal value of 10:
> > $ qconf -sstree
> > id=0
> > name=Root
> > type=0
> > shares=1
> > childnodes=1
> > id=1
> > name=default
> > type=0
> > shares=10
> > childnodes=NONE
> >
> > I modified the scheduling by setting the weight_tickets_share to
> 1000000. I reduced the weight_waiting_time weight_priority weight_urgency
> to well below the weight_ticket (what are good values?).
> > $ qconf -ssconf
> > algorithm                         default
> > schedule_interval                 0:0:15
> > maxujobs                          0
> > queue_sort_method                 seqno
> > job_load_adjustments              np_load_avg=0.50
> > load_adjustment_decay_time        0:7:30
> > load_formula                      np_load_avg
> > schedd_job_info                   false
> > flush_submit_sec                  0
> > flush_finish_sec                  0
> > params                            none
> > reprioritize_interval             0:0:0
> > halftime                          168
> > usage_weight_list                 cpu=0.700000,mem=0.200000,io=0.100000
> > compensation_factor               5.000000
> > weight_user                       0.250000
> > weight_project                    0.250000
> > weight_department                 0.250000
> > weight_job                        0.250000
> > weight_tickets_functional         0
> > weight_tickets_share              1000000
> > share_override_tickets            TRUE
> > share_functional_shares           TRUE
> > max_functional_jobs_to_schedule   200
> > report_pjob_tickets               TRUE
> > max_pending_tasks_per_job         50
> > halflife_decay_list               none
> > policy_hierarchy                  OFS
> > weight_ticket                     0.500000
> > weight_waiting_time               0.000010
> > weight_deadline                   3600000.000000
> > weight_urgency                    0.010000
> > weight_priority                   0.010000
> > max_reservation                   0
> > default_duration                  INFINITY
> >
> > I modified all the users to set the fshare to 1000 $ qconf -muser XXX
> >
> > I modified the general conf to auto_user_fsahre 1000 and
> auto_user_delete_time 7776000 (90 days).  Halftime is set to the default 7
> days (I assume I should increase this).  I don’t know if
> auto_user_delete_time even matters.
> > $ qconf -sconf
> > #global:
> > execd_spool_dir              /opt/gridengine/default/spool
> > mailer
>  /opt/gridengine/default/commond/mail_wrapper.py
> > xterm                        /usr/bin/xterm
> > load_sensor                  none
> > prolog                       none
> > epilog                       none
> > shell_start_mode             posix_compliant
> > login_shells                 sh,bash
> > min_uid                      100
> > min_gid                      100
> > user_lists                   none
> > xuser_lists                  none
> > projects                     none
> > xprojects                    none
> > enforce_project              false
> > enforce_user                 auto
> > load_report_time             00:00:40
> > max_unheard                  00:05:00
> > reschedule_unknown           00:00:00
> > loglevel                     log_info
> > administrator_mail           none
> > set_token_cmd                none
> > pag_cmd                      none
> > token_extend_time            none
> > shepherd_cmd                 none
> > qmaster_params               none
> > execd_params                 ENABLE_BINDING=true ENABLE_ADDGRP_KILL=true
> \
> >                              H_DESCRIPTORS=16K
> > reporting_params             accounting=true reporting=true \
> >                              flush_time=00:00:15 joblog=true
> sharelog=00:00:00
> > finished_jobs                100
> > gid_range                    20000-20100
> > qlogin_command               /opt/gridengine/bin/rocks-qlogin.sh
> > qlogin_daemon                /usr/sbin/sshd -i
> > rlogin_command               builtin
> > rlogin_daemon                builtin
> > rsh_command                  builtin
> > rsh_daemon                   builtin
> > max_aj_instances             2000
> > max_aj_tasks                 75000
> > max_u_jobs                   0
> > max_jobs                     0
> > max_advance_reservations     0
> > auto_user_oticket            0
> > auto_user_fshare             1000
> > auto_user_default_project    none
> > auto_user_delete_time        7776000
> > delegated_file_staging       false
> > reprioritize                 0
> > jsv_url                      none
> > jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w
> >
> > Thanks for your assistance.
> >
> > Cheers
> >
> > Iyad Kandalaft
> >
> >
> > _______________________________________________
> > users mailing list
> > users@gridengine.org
> > https://gridengine.org/mailman/listinfo/users
>
>
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users
>


-- 
*William Bryce* | VP of Products
Univa Corporation <http://www.univa.com/> - 130 Esna Park Drive, Second
Floor, Markham, Ontario, Canada
*Email* bbr...@univa.com | *Mobile: 647.974.2841* | *Office: 647.478.5974*
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to