Hmmm ... just wondering: why the need for setting weight_tickets_share
to 10000000 like you did, if we're not using share tree scheduling?
(Actually, looking at it closer, looks like you're setting that twice -
once to 10000000 and then later to 0. I'm guessing the 2nd values
supersedes the first, so you're effectively setting it to 0.)
In any case, we have several of those other settings in our config, but
with different values:
weight_tickets_share 0
weight_user 0.250000
weight_project 0.250000
weight_department 0.250000
weight_job 0.250000
weight_tickets_functional 10000
weight_tickets_share 0
Perhaps these settings might be causing our issue? Seems unlikely
though, as we're not taking project or department into account in our
scheduling.
Thanks,
DR
On 2017-10-09 5:40 pm, Ian Kaufman wrote:
I am pretty sure you need something like the following (courtesy of
Reuti):
weight_tickets_share 10000000
weight_user 0.900000
weight_project 0.000000
weight_department 0.000000
weight_job 0.100000
weight_tickets_functional 100000
weight_tickets_share 0
policy_hierarchy F
On Mon, Oct 9, 2017 at 2:01 PM, David Rosenstrauch <[email protected]>
wrote:
I'm a bit of a SGE noob, so please bear with me. We're in the
process of a first-time SGE deploy for the users in our department.
Although we've been able to use SGE, submit jobs to the queues
successfully, etc., we're running into issues trying to get the
fair-share scheduling - specifically the functional scheduling - to
work correctly.
We have very simple functional scheduling enabled, via the following
configuration settings:
enforce_user auto
auto_user_fshare 100
weight_tickets_functional 10000
schedd_job_info true
(In addition, the "weight_tickets_share" setting is set to 0,
thereby disabling share tree scheduling.)
A colleague and I are testing this setup by both of us submitting
multiple jobs to one of our queues simultaneously, with me first
submitting a large number of jobs (100) and he submitting a fewer
number (25) shortly afterwards. Our understanding is that the
functional scheduling policy should prevent one user from having
their jobs completely dominate a queue. And so our expectation is
that even though my jobs were submitted first, and there are more of
them, the scheduler should wind up giving his jobs a higher priority
so that he is not forced to wait until all of my jobs complete
before his run. (If he did have to wait, that would effectively be
FIFO scheduling, not fair share.)
Although we aren't seeing FIFO scheduling, we're seeing close to it.
One of his jobs (eventually) gets assigned a high number of
tickets, and a higher priority, and gets scheduled and run. But the
remaining several dozen sit in the queue and don't get run until all
of mine complete - which is not really fair share.
Although it does look like functional scheduling is happening to
some extent (at least one of his jobs is getting prioritized ahead
of mine) this scheduling behavior is not what we were expecting to
see. Our expectation was that one of his jobs would run for every 4
of mine (more or less), and that his jobs would not wind up queued
up to run after mine complete.
Any idea what might be going on here? Do I have my system
misconfigured for functional scheduling? Or am I just
misunderstanding how this is supposed to work? I've already done
quite a bit of googling and man page reading on the relevant topics
and settings, but wasn't able to find a good explanation for the
behavior we're seeing. Any help greatly appreciated!
Thanks,
DR
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users [1]
--
Ian Kaufman
Research Systems Administrator
UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
Links:
------
[1] https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users