Re: monotonic time going back by wrong skews

Scott Cheloha Wed, 07 Apr 2021 08:25:44 -0700

> On Apr 6, 2021, at 07:49, Paul Irofti <p...@irofti.net> wrote:
> 
>>>> The diff is obviously fine. But it is still a heuristic with no real
>>>> motivation except for this particular ESXi VM case. So my question
>>>> about why we choose the minimum instead of the median or the mean has
>>>> not been answered.
>>> 
>>> Because median or mean is affected by outliers.  We actually see
>>> some outliers in samples from the VMware.
>>> 
>>> I suppose there is a better mesure, but I am currently no idia and had
>>> not used that kind of measure in kernel.  On the other hand, finding
>>> the minimum is very simple.
>> Using the median should take care of the outliers though.
>> I'm not at all convinced that taking the absolute value of the
>> difference makes sense.  It probably works in this case since the
>> actual skew on your VM is zero.  So measurements close to zero are
>> "good".  But what if the skew isn't zero?  Take for example an AP that
>> is running ahead of the BP by 5000 ticks.  In that case, the right
>> value for the skew is -5000.  But now imagine that the BP gets
>> "interrupted" while doing a measurement, resulting in a delay of 10000
>> ticks between the two rdtsc_lfence() calls.  That would result in a
>> measured skew of around zero.  And by taking the minimum of the
>> absolute value, you end up using that value.
> 
> Exactly!


I agree that the median is a better choice
of skew than the absolute minimum or
average.

I think this means adding qsort to the kernel,
though.  Unless we want to do median of
medians...

Re: monotonic time going back by wrong skews

Reply via email to