> From: Scott Cheloha <scottchel...@gmail.com>
> Date: Wed, 7 Apr 2021 10:25:04 -0500
> 
> > On Apr 6, 2021, at 07:49, Paul Irofti <p...@irofti.net> wrote:
> > 
> >>>> The diff is obviously fine. But it is still a heuristic with no real
> >>>> motivation except for this particular ESXi VM case. So my question
> >>>> about why we choose the minimum instead of the median or the mean has
> >>>> not been answered.
> >>> 
> >>> Because median or mean is affected by outliers.  We actually see
> >>> some outliers in samples from the VMware.
> >>> 
> >>> I suppose there is a better mesure, but I am currently no idia and had
> >>> not used that kind of measure in kernel.  On the other hand, finding
> >>> the minimum is very simple.
> >> Using the median should take care of the outliers though.
> >> I'm not at all convinced that taking the absolute value of the
> >> difference makes sense.  It probably works in this case since the
> >> actual skew on your VM is zero.  So measurements close to zero are
> >> "good".  But what if the skew isn't zero?  Take for example an AP that
> >> is running ahead of the BP by 5000 ticks.  In that case, the right
> >> value for the skew is -5000.  But now imagine that the BP gets
> >> "interrupted" while doing a measurement, resulting in a delay of 10000
> >> ticks between the two rdtsc_lfence() calls.  That would result in a
> >> measured skew of around zero.  And by taking the minimum of the
> >> absolute value, you end up using that value.
> > 
> > Exactly!
> 
> I agree that the median is a better choice
> of skew than the absolute minimum or
> average.
> 
> I think this means adding qsort to the kernel,
> though.  Unless we want to do median of
> medians...

Or maybe the code that does the actual measurements isn't fit for
purpose.  The current code does two reads of the TSC register on both
the BP and the AP.  This causes deviations in both directions,
depending on whether the BP or the AP gets to experience an SMM event
or VM exit.

The idea between doing the two reads is that by taking the average you
compensate for the time spent signalling the AP and getting a report
back.  But this may actually be making the measurements less accurate
on some systems.

I believe the current code was inspired by what NetBSD does.  But
maybe someone should take a close look at the Linux code.  The Linux
code will have seen waaay more testing than the NetBSD code...

Reply via email to