> Date: Mon, 05 Apr 2021 19:14:49 +0900 (JST)
> From: YASUOKA Masahiko <yasu...@openbsd.org>
> 
> Hi,
> 
> On Mon, 5 Apr 2021 10:43:00 +0300
> Paul Irofti <p...@irofti.net> wrote:
> > On 05.04.2021 06:13, Scott Cheloha wrote:
> >> On Mon, Mar 29, 2021 at 02:00:01PM +0900, YASUOKA Masahiko wrote:
> >>> On Thu, 25 Mar 2021 19:41:35 +0100 (CET)
> >>> Mark Kettenis <mark.kette...@xs4all.nl> wrote:
> >>>>> From: Scott Cheloha <scottchel...@gmail.com>
> >>>>> Date: Thu, 25 Mar 2021 13:18:04 -0500
> >>>>>> On Wed, Mar 24, 2021 at 05:40:21PM +0900, YASUOKA Masahiko wrote:
> >>>>> Which diff did you apply?  Yasuoka provided two diffs.
> >>>>>
> >>>>> In any case, ignore this diff:
> >>>>>
> >>>>>> diff --git a/sys/arch/amd64/amd64/tsc.c b/sys/arch/amd64/amd64/tsc.c
> >>>>>> index 238a5a068e1..3b951a8b5a3 100644
> >>>>>> --- a/sys/arch/amd64/amd64/tsc.c
> >>>>>> +++ b/sys/arch/amd64/amd64/tsc.c
> >>>>>> @@ -212,7 +212,8 @@ cpu_recalibrate_tsc(struct timecounter *tc)
> >>>>>> u_int
> >>>>>> tsc_get_timecount(struct timecounter *tc)
> >>>>>> {
> >>>>>> -      return rdtsc_lfence() + curcpu()->ci_tsc_skew;
> >>>>>> +      //return rdtsc_lfence() + curcpu()->ci_tsc_skew;
> >>>>>> +      return rdtsc_lfence();
> >>>>>> }
> >>>>>>
> >>>>>> void
> >>>>>
> >>>>>
> >>>>> We don't want to discard the skews, that's wrong.
> >>>
> >>> I'm sorry for the confusion.
> >> No problem.
> >> 
> >>>>> The reason it "fixes" Yasuoka's problem is because the real skews
> >>>>> on the ESXi VMs in question are probably close to zero but our
> >>>>> synchronization algorithm is picking huge (wrong) skews due to
> >>>>> some other variable interfering with our measurement.
> >>>>
> >>>> Right.  If a VM exit happens while we're doing our measurement, you'll
> >>>> see a significant delay.  And a guest OS can't prevent those from
> >>>> happening.  But even on real hardware SMM mode may interfere with our
> >>>> measurement.
> >>>
> >>> For machines like the ESXi VMs, the measurement seems to have to
> >>> exclude such delayed values as outliers.  I think taking a lot of
> >>> samples and choice the minimum is a good enough way for the purpose.
> >>>
> >>> I updated the diff.
> >>>
> >>> - delete lines for debug
> >>> - make tsc quality lower if skew is not good enough
> >>> - reduce difference from NetBSD
> >>>
> >>> comment? ok?
> >> If more iterations fixes your problem, great.  It isn't going to make
> >> things worse for machines with sync'd TSCs, makes the TSC usable on
> >> another class of machine, and is relatively cheap.
> >> This is ok cheloha@.
> >> You need another ok, though.
> > 
> > 
> > The diff is obviously fine. But it is still a heuristic with no real
> > motivation except for this particular ESXi VM case. So my question
> > about why we choose the minimum instead of the median or the mean has
> > not been answered.
> 
> Because median or mean is affected by outliers.  We actually see
> some outliers in samples from the VMware.
> 
> I suppose there is a better mesure, but I am currently no idia and had
> not used that kind of measure in kernel.  On the other hand, finding
> the minimum is very simple.

Using the median should take care of the outliers though.

I'm not at all convinced that taking the absolute value of the
difference makes sense.  It probably works in this case since the
actual skew on your VM is zero.  So measurements close to zero are
"good".  But what if the skew isn't zero?  Take for example an AP that
is running ahead of the BP by 5000 ticks.  In that case, the right
value for the skew is -5000.  But now imagine that the BP gets
"interrupted" while doing a measurement, resulting in a delay of 10000
ticks between the two rdtsc_lfence() calls.  That would result in a
measured skew of around zero.  And by taking the minimum of the
absolute value, you end up using that value.

> 
> > Another issue that I see is that people have not reported, at least
> > publicly, that this runs fine on their normal OpenBSD machines.
> 
> Some dmesgs posted on public lists seems to have the same problem.
> 
> https://marc.info/?l=openbsd-bugs&w=2&r=1&s=disabling+user+TSC&q=b
> https://marc.info/?l=openbsd-tech&w=2&r=1&s=disabling+user+TSC&q=b
> https://marc.info/?l=openbsd-ports&w=2&r=1&s=disabling+user+TSC&q=b
> 
> For example,
> 
> https://marc.info/?l=openbsd-bugs&m=161618496905444&w=2
> 
> |Subject:    wg(4) crash
> |From:       Stuart Henderson <stu () spacehopper ! org>
> |bios0: vendor Dell Inc. version "2.9.0" date 12/06/2019
> |bios0: Dell Inc. PowerEdge R620
> |cpu1: disabling user TSC (skew=135)
> |cpu1: smt 0, core 0, package 1
> 
> https://marc.info/?l=openbsd-ports&m=161306073708427&w=2
> |Subject:    Re: sysutils/nut README APC over USB device chgrp/chmod
> |From:       Marcus MERIGHI <mcmer-openbsd () tor ! at>
> |bios0: vendor American Megatrends Inc. version "3.1" date 06/07/2018
> |cpu11: disabling user TSC (skew=240)
> |cpu11: smt 0, core 3, package 1
> 
> these 2 are real machine and using 2 CPU sockets.
> 
> https://marc.info/?l=openbsd-ports&m=161562278114172&w=2
> |Subject:    ruby27 vs Puppet
> |From:       Giovanni Bechis <giovanni () paclan ! it>
> |bios0: vendor Phoenix Technologies LTD version "6.00" date 12/12/2018
> |bios0: VMware, Inc. VMware Virtual Platform
> |cpu1: disabling user TSC (skew=-12705)
> 
> VMware.  seems the same problem of mine.
> 
> I'll ask people to do the same test which cheloha@ write in previous
> mail.
> 
> > If you think you have enough tests and you want to go ahead with the
> > current heuristic, then the diff is obviously fine. OK pirofti@
> > 
> >> 
> >>> Index: sys/arch/amd64//amd64/tsc.c
> >>> ===================================================================
> >>> RCS file: /disk/cvs/openbsd/src/sys/arch/amd64/amd64/tsc.c,v
> >>> retrieving revision 1.23
> >>> diff -u -p -r1.23 tsc.c
> >>> --- sys/arch/amd64//amd64/tsc.c   23 Feb 2021 04:44:30 -0000      1.23
> >>> +++ sys/arch/amd64//amd64/tsc.c   29 Mar 2021 04:18:31 -0000
> >>> @@ -38,6 +38,7 @@ int             tsc_is_invariant;
> >>>     #define       TSC_DRIFT_MAX                   250
> >>>   #define TSC_SKEW_MAX                    100
> >>> +#define  TSC_SYNC_ROUNDS                 1000
> >>>   int64_t tsc_drift_observed;
> >>>     volatile int64_t      tsc_sync_val;
> >>> @@ -235,6 +236,7 @@ tsc_timecounter_init(struct cpu_info *ci
> >>>                   printf("%s: disabling user TSC (skew=%lld)\n",
> >>>                       ci->ci_dev->dv_xname, (long long)ci->ci_tsc_skew);
> >>>                   tsc_timecounter.tc_user = 0;
> >>> +         tsc_timecounter.tc_quality = -1000;
> >>>           }
> >>>           if (!(ci->ci_flags & CPUF_PRIMARY) ||
> >>> @@ -314,13 +316,19 @@ tsc_read_bp(struct cpu_info *ci, uint64_
> >>>   void
> >>>   tsc_sync_bp(struct cpu_info *ci)
> >>>   {
> >>> + int i, val, diff;
> >>>           uint64_t bptsc, aptsc;
> >>>   -       tsc_read_bp(ci, &bptsc, &aptsc); /* discarded - cache effects */
> >>> - tsc_read_bp(ci, &bptsc, &aptsc);
> >>> + val = INT_MAX;
> >>> + for (i = 0; i < TSC_SYNC_ROUNDS; i++) {
> >>> +         tsc_read_bp(ci, &bptsc, &aptsc);
> >>> +         diff = bptsc - aptsc;
> >>> +         if (abs(diff) < abs(val))
> >>> +                 val = diff;
> >>> + }
> >>>           /* Compute final value to adjust for skew. */
> >>> - ci->ci_tsc_skew = bptsc - aptsc;
> >>> + ci->ci_tsc_skew = val;
> >>>   }
> >>>     /*
> >>> @@ -351,8 +359,10 @@ tsc_post_ap(struct cpu_info *ci)
> >>>   void
> >>>   tsc_sync_ap(struct cpu_info *ci)
> >>>   {
> >>> - tsc_post_ap(ci);
> >>> - tsc_post_ap(ci);
> >>> + int i;
> >>> +
> >>> + for (i = 0; i < TSC_SYNC_ROUNDS; i++)
> >>> +         tsc_post_ap(ci);
> >>>   }
> >>>     void
> 

Reply via email to