Jesse is right that it is
"real" half-life as in radio active decay formulas. So after each
half-life interval the impact of a recorded amount of resource
consumption contributed to a job (and thus to a user/project leaf node
in the share tree) will have become cut in half. If you feel that the
share tree policy "forgets" too quickly then simply increase the
half-life. There's also the compensation factor and usage scaling
factors you can play with to adjust the policy to how you want it
behave.
While the policy's inherent algorithm is totally deterministic it is
definitely challenging to try following what's going on. You'd do this
for debugging reasons but otherwise it is about as helpful as
calculating the pertinent laws of physics when steering a car around a
corner. There's way too many variables which are constantly changing. So
my advice would be the same as for driving a car: try to point into the
right direction and make adjustments as you see fit.
As per relating the accounting info to what's happening in the
share-tree policy: that's next to impossible ... which is why we've
chosen to augment the UniSight accouting/reporting technology in our
proprietary Univa Grid Engine version to include reporting on share tree
history. Sorry for the commercial note here but it's the only solution I
can point you to in this regard.
Cheers,
Fritz
Jesse Becker schrieb:
On
Wed, May 08, 2013 at 02:02:04PM -0700, Brian McNally wrote:
Thanks Reuti, you're awesome!
I thought the halftime just dictated the length of time usage took
before it was half its original value. It seems to be that that is
not the same as how long the scheduler keeps usage information for
jobs. Although, at some point, say 4-5 halflife cycles the decayed
usage is very small and doesn't have much of an impact.
I seem to recall hearing that "5 halflives" is how long radioactive
stuff has to decay before it's "safe." Don't quote me on that though.
:)
Poking around a bit in the sgeee.c file (from SoGE version 8.1.1, which
is what I have handy ATM), it looks like that, even though halftime is
specified in hours, the calculations are actually done in minutes.
It also looks like a real exponential decay is used, instead of a linear
decrease (as in some of the load calculations). I think that the actual
decay rates come from the following (sge_support.c):
/*--------------------------------------------------------------------
* calculate_decay_constant - calculates decay rate and constant based
* on the decay half life and usage interval. The halftime argument
* is in minutes.
*--------------------------------------------------------------------*/
void
calculate_decay_constant( double halftime,
double *decay_rate,
double *decay_constant )
{
if (halftime < 0) {
*decay_rate = 1.0;
*decay_constant = 0;
} else if (halftime == 0) {
*decay_rate = 0;
*decay_constant = 1.0;
} else {
*decay_rate = - log(0.5) / (halftime * 60);
*decay_constant = 1 - (*decay_rate * sge_usage_interval);
}
return;
}
This is especially interesting since it implies that negative halftimes
are acceptible. Sure enough, setting a negative value zeros out
historical usage:
https://blogs.oracle.com/sgrell/entry/a_couple_lines_on_halftime
So yes, you'd need to keep your accounting files around for some number
of halftimes. At 5 halflives, you're at 1/32nd of the original
weighting, or about 3%.
--
Brian McNally
On 05/08/2013 01:46 PM, Reuti wrote:
Hi,
Am 08.05.2013 um 22:30 schrieb Brian McNally:
qacct reports usage from a file, but GE has
its own internal database for tracking jobs and usage.
You mean for the share tree policy? Yes.
Is this correct? If so, what controls the
length of time GE keeps job data for?
The "halftime" setting in the scheduler configuration (`man
sched_conf`).
It seems that using qacct to display overall
usage per user (-o), for example, might be a little misleading if the
actual accounting information is stored internally. Users might draw
conclusions about their usage and how that'll impact their job
priorities based on potentially incorrect data.
Unfortunately this is correct. You can even remove the accouting
file or rotate it which might lead to even different output. It would be
hard to mimic the internal computation. Maybe setting
"report_pjob_tickets" to true could give them a hint at which position
their jobs are in the pending list (usually it's switched off for
performance reasons).
-- Reuti
Thanks,
--
Brian McNally
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
--
Fritz Ferstl | CTO and Business Development, EMEA Univa
Corporation | The Data Center Optimization Company E-Mail:
[email protected] | Phone: +49.9471.200.195 | Mobile:
+49.170.819.7390

|
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users