Fritz,

We're actually a UGE customer, although the particular cluster I was looking at hasn't yet been migrated to UGE. I'm not sure if we have a licenses/support for UniSight though.

--
Brian McNally

On 05/09/2013 12:32 AM, Fritz Ferstl wrote:
Jesse is right that it is "real" half-life as in radio active decay
formulas. So after each half-life interval the impact of a recorded
amount of resource consumption contributed to a job (and thus to a
user/project leaf node in the share tree) will have become cut in half.
If you feel that the share tree policy "forgets" too quickly then simply
increase the half-life. There's also the compensation factor and usage
scaling factors you can play with to adjust the policy to how you want
it behave.

While the policy's inherent algorithm is totally deterministic it is
definitely challenging to try following what's going on. You'd do this
for debugging reasons but otherwise it is about as helpful as
calculating the pertinent laws of physics when steering a car around a
corner. There's way too many variables which are constantly changing. So
my advice would be the same as for driving a car: try to point into the
right direction and make adjustments as you see fit.

As per relating the accounting info to what's happening in the
share-tree policy: that's next to impossible ... which is why we've
chosen to augment the UniSight accouting/reporting technology in our
proprietary Univa Grid Engine version to include reporting on share tree
history. Sorry for the commercial note here but it's the only solution I
can point you to in this regard.

Cheers,

Fritz

Jesse Becker schrieb:
On Wed, May 08, 2013 at 02:02:04PM -0700, Brian McNally wrote:
Thanks Reuti, you're awesome!

I thought the halftime just dictated the length of time usage took
before it was half its original value. It seems to be that that is
not the same as how long the scheduler keeps usage information for
jobs. Although, at some point, say 4-5 halflife cycles the decayed
usage is very small and doesn't have much of an impact.

I seem to recall hearing that "5 halflives" is how long radioactive
stuff has to decay before it's "safe."  Don't quote me on that though.
:)

Poking around a bit in the sgeee.c file (from SoGE version 8.1.1, which
is what I have handy ATM), it looks like that, even though halftime is
specified in hours, the calculations are actually done in minutes.

It also looks like a real exponential decay is used, instead of a linear
decrease (as in some of the load calculations).  I think that the actual
decay rates come from the following (sge_support.c):

/*--------------------------------------------------------------------
 * calculate_decay_constant - calculates decay rate and constant based
 * on the decay half life and usage interval. The halftime argument
 * is in minutes.
 *--------------------------------------------------------------------*/

void
calculate_decay_constant( double halftime,
                          double *decay_rate,
                          double *decay_constant )
{
   if (halftime < 0) {
      *decay_rate = 1.0;
      *decay_constant = 0;
   } else if (halftime == 0) {
      *decay_rate = 0;
      *decay_constant = 1.0;
   } else {
      *decay_rate = - log(0.5) / (halftime * 60);
      *decay_constant = 1 - (*decay_rate * sge_usage_interval);
   }
   return;
}

This is especially interesting since it implies that negative halftimes
are acceptible.  Sure enough, setting a negative value zeros out
historical usage:
https://blogs.oracle.com/sgrell/entry/a_couple_lines_on_halftime


So yes, you'd need to keep your accounting files around for some number
of halftimes.  At 5 halflives, you're at 1/32nd of the original
weighting, or about 3%.




--
Brian McNally

On 05/08/2013 01:46 PM, Reuti wrote:
Hi,

Am 08.05.2013 um 22:30 schrieb Brian McNally:

qacct reports usage from a file, but GE has its own internal
database for tracking jobs and usage.

You mean for the share tree policy? Yes.


Is this correct? If so, what controls the length of time GE keeps
job data for?

The "halftime" setting in the scheduler configuration (`man
sched_conf`).


It seems that using qacct to display overall usage per user (-o),
for example, might be a little misleading if the actual accounting
information is stored internally. Users might draw conclusions
about their usage and how that'll impact their job priorities based
on potentially incorrect data.

Unfortunately this is correct. You can even remove the accouting
file or rotate it which might lead to even different output. It
would be hard to mimic the internal computation. Maybe setting
"report_pjob_tickets" to true could give them a hint at which
position their jobs are in the pending list (usually it's switched
off for performance reasons).

-- Reuti



Thanks,

--
Brian McNally
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users


--

UnivaFritz Ferstl | CTO and Business Development, EMEA
Univa Corporation <http://www.univa.com/> | The Data Center Optimization
Company
E-Mail: [email protected] | Phone: +49.9471.200.195 | Mobile:
+49.170.819.7390

Where Grid Engine lives



_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to