On 2017-10-09 6:23 pm, Reuti wrote:
Am 10.10.2017 um 00:00 schrieb David Rosenstrauch:
On 2017-10-09 5:45 pm, Reuti wrote:
Am 09.10.2017 um 23:01 schrieb David Rosenstrauch:
I'm a bit of a SGE noob, so please bear with me. We're in the
process of a first-time SGE deploy for the users in our department.
Although we've been able to use SGE, submit jobs to the queues
successfully, etc., we're running into issues trying to get the
fair-share scheduling - specifically the functional scheduling - to
work correctly.
We have very simple functional scheduling enabled, via the following
configuration settings:
enforce_user auto
auto_user_fshare 100
weight_tickets_functional 10000
schedd_job_info true
(In addition, the "weight_tickets_share" setting is set to 0,
thereby disabling share tree scheduling.)
A colleague and I are testing this setup by both of us submitting
multiple jobs to one of our queues simultaneously, with me first
submitting a large number of jobs (100) and he submitting a fewer
number (25) shortly afterwards. Our understanding is that the
functional scheduling policy should prevent one user from having
their jobs completely dominate a queue. And so our expectation is
that even though my jobs were submitted first, and there are more of
them, the scheduler should wind up giving his jobs a higher priority
so that he is not forced to wait until all of my jobs complete
before his run. (If he did have to wait, that would effectively be
FIFO scheduling, not fair share.)
The display of the pending tickets has to be enabled too to see the
effect (you should see them a being 0 right now in the pending list):
report_pjob_tickets TRUE
In addition you can set the:
policy_hierarchy F
-- Reuti
Thanks for the feedback.
We do have report_pjob_tickets set to TRUE. However, our
policy_hierarchy is set to OFS. Still, shouldn't that not be an issue
if we have weight_tickets_share set to zero? (I.e., if we're not
using override or shared tree, then shouldn't this be effectively
equivalent to "policy_hierarchy F"?)
Yes, but can be streamlined.
Are you mixing parallel and serial jobs? The default is an urgency in
the slots complex which leads to the effect that jobs requesting more
slots are more important.
- -- Reuti
We were doing our testing with serial jobs, but our production loads
will largely be parallel. (Primarily array jobs.)
The default behavior you described (jobs requesting more slots being
considered more important) sounds like it explains what we were seeing.
FYI I also took the advice listed in an old post of yours to the list
(http://gridengine.org/pipermail/users/2017-May/009766.html) and echoed
by Ian K earlier in this thread and made the following setting changes:
weight_user 0.900000
weight_project 0.000000
weight_department 0.000000
weight_job 0.100000
weight_tickets_functional 100000
weight_tickets_share 0
policy_hierarchy F
Changing those settings does seem to be providing much more
balanced/fair scheduling now, as my colleague's jobs are now getting
much more interleaved with mine.
Thanks much for the suggestions!
Best,
DR
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users