Re: [gridengine users] Strange behavior with functional scheduling

Ian Kaufman Mon, 09 Oct 2017 15:13:35 -0700

Just what I pulled from the archives when searching for someone with a
similar issue.


Ian

On Mon, Oct 9, 2017 at 3:06 PM, David Rosenstrauch <[email protected]>
wrote:

> Hmmm ... just wondering: why the need for setting weight_tickets_share to
> 10000000 like you did, if we're not using share tree scheduling?
> (Actually, looking at it closer, looks like you're setting that twice -
> once to 10000000 and then later to 0.  I'm guessing the 2nd values
> supersedes the first, so you're effectively setting it to 0.)
>
> In any case, we have several of those other settings in our config, but
> with different values:
>
> weight_tickets_share              0
> weight_user                       0.250000
> weight_project                    0.250000
> weight_department                 0.250000
> weight_job                        0.250000
> weight_tickets_functional         10000
> weight_tickets_share              0
>
> Perhaps these settings might be causing our issue?  Seems unlikely though,
> as we're not taking project or department into account in our scheduling.
>
> Thanks,
>
> DR
>
>
> On 2017-10-09 5:40 pm, Ian Kaufman wrote:
>
>> I am pretty sure you need something like the following (courtesy of
>> Reuti):
>>
>> weight_tickets_share              10000000
>>
>> weight_user                       0.900000
>> weight_project                    0.000000
>> weight_department                 0.000000
>> weight_job                        0.100000
>> weight_tickets_functional         100000
>> weight_tickets_share              0
>>
>> policy_hierarchy                  F
>>
>> On Mon, Oct 9, 2017 at 2:01 PM, David Rosenstrauch <[email protected]>
>> wrote:
>>
>> I'm a bit of a SGE noob, so please bear with me.  We're in the
>>> process of a first-time SGE deploy for the users in our department.
>>> Although we've been able to use SGE, submit jobs to the queues
>>> successfully, etc., we're running into issues trying to get the
>>> fair-share scheduling - specifically the functional scheduling - to
>>> work correctly.
>>>
>>> We have very simple functional scheduling enabled, via the following
>>> configuration settings:
>>>
>>> enforce_user                 auto
>>> auto_user_fshare             100
>>> weight_tickets_functional         10000
>>> schedd_job_info                   true
>>>
>>> (In addition, the "weight_tickets_share" setting is set to 0,
>>> thereby disabling share tree scheduling.)
>>>
>>> A colleague and I are testing this setup by both of us submitting
>>> multiple jobs to one of our queues simultaneously, with me first
>>> submitting a large number of jobs (100) and he submitting a fewer
>>> number (25) shortly afterwards.  Our understanding is that the
>>> functional scheduling policy should prevent one user from having
>>> their jobs completely dominate a queue.  And so our expectation is
>>> that even though my jobs were submitted first, and there are more of
>>> them, the scheduler should wind up giving his jobs a higher priority
>>> so that he is not forced to wait until all of my jobs complete
>>> before his run.  (If he did have to wait, that would effectively be
>>> FIFO scheduling, not fair share.)
>>>
>>> Although we aren't seeing FIFO scheduling, we're seeing close to it.
>>> One of his jobs (eventually) gets assigned a high number of
>>> tickets, and a higher priority, and gets scheduled and run.  But the
>>> remaining several dozen sit in the queue and don't get run until all
>>> of mine complete - which is not really fair share.
>>>
>>> Although it does look like functional scheduling is happening to
>>> some extent (at least one of his jobs is getting prioritized ahead
>>> of mine) this scheduling behavior is not what we were expecting to
>>> see.  Our expectation was that one of his jobs would run for every 4
>>> of mine (more or less), and that his jobs would not wind up queued
>>> up to run after mine complete.
>>>
>>> Any idea what might be going on here?  Do I have my system
>>> misconfigured for functional scheduling?  Or am I just
>>> misunderstanding how this is supposed to work?  I've already done
>>> quite a bit of googling and man page reading on the relevant topics
>>> and settings, but wasn't able to find a good explanation for the
>>> behavior we're seeing.  Any help greatly appreciated!
>>>
>>> Thanks,
>>>
>>> DR
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users [1]
>>>
>>
>> --
>> Ian Kaufman
>> Research Systems Administrator
>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
>>
>>
>> Links:
>> ------
>> [1] https://gridengine.org/mailman/listinfo/users
>>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
>



-- 
Ian Kaufman
Research Systems Administrator
UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Strange behavior with functional scheduling

Reply via email to