Re: [gridengine users] Grid Engine accounting question

Fritz Ferstl Thu, 09 May 2013 22:00:28 -0700

Brian,

your name sounded familiar - now I know why ;-)


Of course you have access to UniSight. UniSight is part of the Univa Grid 
Engine product bundle. The fair-share reporting & analytics functionality I was 
referring to is however relatively recent and you probably do not have it yet 
in the version you have installed.

Let's take this off-line and check on that. And please get in touch with our 
support on any questions you might have.

Cheers,

Fritz

Sent from my iPhone

Am 09.05.2013 um 23:01 schrieb Brian McNally <[email protected]>:

> Fritz,
> 
> We're actually a UGE customer, although the particular cluster I was looking 
> at hasn't yet been migrated to UGE. I'm not sure if we have a 
> licenses/support for UniSight though.
> 
> --
> Brian McNally
> 
> On 05/09/2013 12:32 AM, Fritz Ferstl wrote:
>> Jesse is right that it is "real" half-life as in radio active decay
>> formulas. So after each half-life interval the impact of a recorded
>> amount of resource consumption contributed to a job (and thus to a
>> user/project leaf node in the share tree) will have become cut in half.
>> If you feel that the share tree policy "forgets" too quickly then simply
>> increase the half-life. There's also the compensation factor and usage
>> scaling factors you can play with to adjust the policy to how you want
>> it behave.
>> 
>> While the policy's inherent algorithm is totally deterministic it is
>> definitely challenging to try following what's going on. You'd do this
>> for debugging reasons but otherwise it is about as helpful as
>> calculating the pertinent laws of physics when steering a car around a
>> corner. There's way too many variables which are constantly changing. So
>> my advice would be the same as for driving a car: try to point into the
>> right direction and make adjustments as you see fit.
>> 
>> As per relating the accounting info to what's happening in the
>> share-tree policy: that's next to impossible ... which is why we've
>> chosen to augment the UniSight accouting/reporting technology in our
>> proprietary Univa Grid Engine version to include reporting on share tree
>> history. Sorry for the commercial note here but it's the only solution I
>> can point you to in this regard.
>> 
>> Cheers,
>> 
>> Fritz
>> 
>> Jesse Becker schrieb:
>>> On Wed, May 08, 2013 at 02:02:04PM -0700, Brian McNally wrote:
>>>> Thanks Reuti, you're awesome!
>>>> 
>>>> I thought the halftime just dictated the length of time usage took
>>>> before it was half its original value. It seems to be that that is
>>>> not the same as how long the scheduler keeps usage information for
>>>> jobs. Although, at some point, say 4-5 halflife cycles the decayed
>>>> usage is very small and doesn't have much of an impact.
>>> 
>>> I seem to recall hearing that "5 halflives" is how long radioactive
>>> stuff has to decay before it's "safe."  Don't quote me on that though.
>>> :)
>>> 
>>> Poking around a bit in the sgeee.c file (from SoGE version 8.1.1, which
>>> is what I have handy ATM), it looks like that, even though halftime is
>>> specified in hours, the calculations are actually done in minutes.
>>> 
>>> It also looks like a real exponential decay is used, instead of a linear
>>> decrease (as in some of the load calculations).  I think that the actual
>>> decay rates come from the following (sge_support.c):
>>> 
>>> /*--------------------------------------------------------------------
>>> * calculate_decay_constant - calculates decay rate and constant based
>>> * on the decay half life and usage interval. The halftime argument
>>> * is in minutes.
>>> *--------------------------------------------------------------------*/
>>> 
>>> void
>>> calculate_decay_constant( double halftime,
>>>                          double *decay_rate,
>>>                          double *decay_constant )
>>> {
>>>   if (halftime < 0) {
>>>      *decay_rate = 1.0;
>>>      *decay_constant = 0;
>>>   } else if (halftime == 0) {
>>>      *decay_rate = 0;
>>>      *decay_constant = 1.0;
>>>   } else {
>>>      *decay_rate = - log(0.5) / (halftime * 60);
>>>      *decay_constant = 1 - (*decay_rate * sge_usage_interval);
>>>   }
>>>   return;
>>> }
>>> 
>>> This is especially interesting since it implies that negative halftimes
>>> are acceptible.  Sure enough, setting a negative value zeros out
>>> historical usage:
>>> https://blogs.oracle.com/sgrell/entry/a_couple_lines_on_halftime
>>> 
>>> 
>>> So yes, you'd need to keep your accounting files around for some number
>>> of halftimes.  At 5 halflives, you're at 1/32nd of the original
>>> weighting, or about 3%.
>>> 
>>> 
>>> 
>>> 
>>>> --
>>>> Brian McNally
>>>> 
>>>> On 05/08/2013 01:46 PM, Reuti wrote:
>>>>> Hi,
>>>>> 
>>>>> Am 08.05.2013 um 22:30 schrieb Brian McNally:
>>>>> 
>>>>>> qacct reports usage from a file, but GE has its own internal
>>>>>> database for tracking jobs and usage.
>>>>> 
>>>>> You mean for the share tree policy? Yes.
>>>>> 
>>>>> 
>>>>>> Is this correct? If so, what controls the length of time GE keeps
>>>>>> job data for?
>>>>> 
>>>>> The "halftime" setting in the scheduler configuration (`man
>>>>> sched_conf`).
>>>>> 
>>>>> 
>>>>>> It seems that using qacct to display overall usage per user (-o),
>>>>>> for example, might be a little misleading if the actual accounting
>>>>>> information is stored internally. Users might draw conclusions
>>>>>> about their usage and how that'll impact their job priorities based
>>>>>> on potentially incorrect data.
>>>>> 
>>>>> Unfortunately this is correct. You can even remove the accouting
>>>>> file or rotate it which might lead to even different output. It
>>>>> would be hard to mimic the internal computation. Maybe setting
>>>>> "report_pjob_tickets" to true could give them a hint at which
>>>>> position their jobs are in the pending list (usually it's switched
>>>>> off for performance reasons).
>>>>> 
>>>>> -- Reuti
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> --
>>>>>> Brian McNally
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> [email protected]
>>>>>> https://gridengine.org/mailman/listinfo/users
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> https://gridengine.org/mailman/listinfo/users
>> 
>> --
>> 
>> UnivaFritz Ferstl | CTO and Business Development, EMEA
>> Univa Corporation <http://www.univa.com/> | The Data Center Optimization
>> Company
>> E-Mail: [email protected] | Phone: +49.9471.200.195 | Mobile:
>> +49.170.819.7390
>> 
>> Where Grid Engine lives
>> 
>> 
>> 

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Grid Engine accounting question

Reply via email to