On 2017-10-09 6:23 pm, Reuti wrote:
Am 10.10.2017 um 00:00 schrieb David Rosenstrauch:

On 2017-10-09 5:45 pm, Reuti wrote:
Am 09.10.2017 um 23:01 schrieb David Rosenstrauch:
I'm a bit of a SGE noob, so please bear with me. We're in the process of a first-time SGE deploy for the users in our department. Although we've been able to use SGE, submit jobs to the queues successfully, etc., we're running into issues trying to get the fair-share scheduling - specifically the functional scheduling - to work correctly. We have very simple functional scheduling enabled, via the following configuration settings:
enforce_user                 auto
auto_user_fshare             100
weight_tickets_functional         10000
schedd_job_info                   true
(In addition, the "weight_tickets_share" setting is set to 0, thereby disabling share tree scheduling.) A colleague and I are testing this setup by both of us submitting multiple jobs to one of our queues simultaneously, with me first submitting a large number of jobs (100) and he submitting a fewer number (25) shortly afterwards. Our understanding is that the functional scheduling policy should prevent one user from having their jobs completely dominate a queue. And so our expectation is that even though my jobs were submitted first, and there are more of them, the scheduler should wind up giving his jobs a higher priority so that he is not forced to wait until all of my jobs complete before his run. (If he did have to wait, that would effectively be FIFO scheduling, not fair share.)
The display of the pending tickets has to be enabled too to see the
effect (you should see them a being 0 right now in the pending list):
report_pjob_tickets               TRUE
In addition you can set the:
policy_hierarchy                  F
-- Reuti


Thanks for the feedback.

We do have report_pjob_tickets set to TRUE. However, our policy_hierarchy is set to OFS. Still, shouldn't that not be an issue if we have weight_tickets_share set to zero? (I.e., if we're not using override or shared tree, then shouldn't this be effectively equivalent to "policy_hierarchy F"?)

Yes, but can be streamlined.

Are you mixing parallel and serial jobs? The default is an urgency in
the slots complex which leads to the effect that jobs requesting more
slots are more important.

- -- Reuti


We were doing our testing with serial jobs, but our production loads will largely be parallel. (Primarily array jobs.)

The default behavior you described (jobs requesting more slots being considered more important) sounds like it explains what we were seeing.

FYI I also took the advice listed in an old post of yours to the list (http://gridengine.org/pipermail/users/2017-May/009766.html) and echoed by Ian K earlier in this thread and made the following setting changes:

weight_user                       0.900000
weight_project                    0.000000
weight_department                 0.000000
weight_job                        0.100000
weight_tickets_functional         100000
weight_tickets_share              0
policy_hierarchy                  F

Changing those settings does seem to be providing much more balanced/fair scheduling now, as my colleague's jobs are now getting much more interleaved with mine.

Thanks much for the suggestions!

Best,

DR
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to