On Mon, 5 Apr 2021 14:24:03 +0200 (CEST)
Mark Kettenis <mark.kette...@xs4all.nl> wrote:
>> Date: Mon, 05 Apr 2021 19:14:49 +0900 (JST)
>> From: YASUOKA Masahiko <yasu...@openbsd.org>
>> 
>> Hi,
>> 
>> On Mon, 5 Apr 2021 10:43:00 +0300
>> Paul Irofti <p...@irofti.net> wrote:
>> > On 05.04.2021 06:13, Scott Cheloha wrote:
>> >> On Mon, Mar 29, 2021 at 02:00:01PM +0900, YASUOKA Masahiko wrote:
>> >>> On Thu, 25 Mar 2021 19:41:35 +0100 (CET)
>> >>> Mark Kettenis <mark.kette...@xs4all.nl> wrote:
>> >>>>> From: Scott Cheloha <scottchel...@gmail.com>
>> >>>>> Date: Thu, 25 Mar 2021 13:18:04 -0500
>> >>>>>> On Wed, Mar 24, 2021 at 05:40:21PM +0900, YASUOKA Masahiko wrote:
>> >>>>> Which diff did you apply?  Yasuoka provided two diffs.
>> >>>>>
>> >>>>> In any case, ignore this diff:
>> >>>>>
>> >>>>>> diff --git a/sys/arch/amd64/amd64/tsc.c b/sys/arch/amd64/amd64/tsc.c
>> >>>>>> index 238a5a068e1..3b951a8b5a3 100644
>> >>>>>> --- a/sys/arch/amd64/amd64/tsc.c
>> >>>>>> +++ b/sys/arch/amd64/amd64/tsc.c
>> >>>>>> @@ -212,7 +212,8 @@ cpu_recalibrate_tsc(struct timecounter *tc)
>> >>>>>> u_int
>> >>>>>> tsc_get_timecount(struct timecounter *tc)
>> >>>>>> {
>> >>>>>> -     return rdtsc_lfence() + curcpu()->ci_tsc_skew;
>> >>>>>> +     //return rdtsc_lfence() + curcpu()->ci_tsc_skew;
>> >>>>>> +     return rdtsc_lfence();
>> >>>>>> }
>> >>>>>>
>> >>>>>> void
>> >>>>>
>> >>>>>
>> >>>>> We don't want to discard the skews, that's wrong.
>> >>>
>> >>> I'm sorry for the confusion.
>> >> No problem.
>> >> 
>> >>>>> The reason it "fixes" Yasuoka's problem is because the real skews
>> >>>>> on the ESXi VMs in question are probably close to zero but our
>> >>>>> synchronization algorithm is picking huge (wrong) skews due to
>> >>>>> some other variable interfering with our measurement.
>> >>>>
>> >>>> Right.  If a VM exit happens while we're doing our measurement, you'll
>> >>>> see a significant delay.  And a guest OS can't prevent those from
>> >>>> happening.  But even on real hardware SMM mode may interfere with our
>> >>>> measurement.
>> >>>
>> >>> For machines like the ESXi VMs, the measurement seems to have to
>> >>> exclude such delayed values as outliers.  I think taking a lot of
>> >>> samples and choice the minimum is a good enough way for the purpose.
>> >>>
>> >>> I updated the diff.
>> >>>
>> >>> - delete lines for debug
>> >>> - make tsc quality lower if skew is not good enough
>> >>> - reduce difference from NetBSD
>> >>>
>> >>> comment? ok?
>> >> If more iterations fixes your problem, great.  It isn't going to make
>> >> things worse for machines with sync'd TSCs, makes the TSC usable on
>> >> another class of machine, and is relatively cheap.
>> >> This is ok cheloha@.
>> >> You need another ok, though.
>> > 
>> > 
>> > The diff is obviously fine. But it is still a heuristic with no real
>> > motivation except for this particular ESXi VM case. So my question
>> > about why we choose the minimum instead of the median or the mean has
>> > not been answered.
>> 
>> Because median or mean is affected by outliers.  We actually see
>> some outliers in samples from the VMware.
>> 
>> I suppose there is a better mesure, but I am currently no idia and had
>> not used that kind of measure in kernel.  On the other hand, finding
>> the minimum is very simple.
> 
> Using the median should take care of the outliers though.

You are right.  I misunderstood the meaning.

> I'm not at all convinced that taking the absolute value of the
> difference makes sense.  It probably works in this case since the
> actual skew on your VM is zero.  So measurements close to zero are
> "good".  But what if the skew isn't zero?  Take for example an AP that
> is running ahead of the BP by 5000 ticks.  In that case, the right
> value for the skew is -5000.  But now imagine that the BP gets
> "interrupted" while doing a measurement, resulting in a delay of 10000
> ticks between the two rdtsc_lfence() calls.  That would result in a
> measured skew of around zero.  And by taking the minimum of the
> absolute value, you end up using that value.

Indeed.  Is taking the median or the mode better for the cases?

>> > Another issue that I see is that people have not reported, at least
>> > publicly, that this runs fine on their normal OpenBSD machines.
>> 
>> Some dmesgs posted on public lists seems to have the same problem.
>> 
>> https://marc.info/?l=openbsd-bugs&w=2&r=1&s=disabling+user+TSC&q=b
>> https://marc.info/?l=openbsd-tech&w=2&r=1&s=disabling+user+TSC&q=b
>> https://marc.info/?l=openbsd-ports&w=2&r=1&s=disabling+user+TSC&q=b
>> 
>> For example,
>> 
>> https://marc.info/?l=openbsd-bugs&m=161618496905444&w=2
>> 
>> |Subject:    wg(4) crash
>> |From:       Stuart Henderson <stu () spacehopper ! org>
>> |bios0: vendor Dell Inc. version "2.9.0" date 12/06/2019
>> |bios0: Dell Inc. PowerEdge R620
>> |cpu1: disabling user TSC (skew=135)
>> |cpu1: smt 0, core 0, package 1
>> 
>> https://marc.info/?l=openbsd-ports&m=161306073708427&w=2
>> |Subject:    Re: sysutils/nut README APC over USB device chgrp/chmod
>> |From:       Marcus MERIGHI <mcmer-openbsd () tor ! at>
>> |bios0: vendor American Megatrends Inc. version "3.1" date 06/07/2018
>> |cpu11: disabling user TSC (skew=240)
>> |cpu11: smt 0, core 3, package 1
>> 
>> these 2 are real machine and using 2 CPU sockets.
>> 
>> https://marc.info/?l=openbsd-ports&m=161562278114172&w=2
>> |Subject:    ruby27 vs Puppet
>> |From:       Giovanni Bechis <giovanni () paclan ! it>
>> |bios0: vendor Phoenix Technologies LTD version "6.00" date 12/12/2018
>> |bios0: VMware, Inc. VMware Virtual Platform
>> |cpu1: disabling user TSC (skew=-12705)
>> 
>> VMware.  seems the same problem of mine.
>> 
>> I'll ask people to do the same test which cheloha@ write in previous
>> mail.
>> 
>> > If you think you have enough tests and you want to go ahead with the
>> > current heuristic, then the diff is obviously fine. OK pirofti@
>> > 
>> >> 
>> >>> Index: sys/arch/amd64//amd64/tsc.c
>> >>> ===================================================================
>> >>> RCS file: /disk/cvs/openbsd/src/sys/arch/amd64/amd64/tsc.c,v
>> >>> retrieving revision 1.23
>> >>> diff -u -p -r1.23 tsc.c
>> >>> --- sys/arch/amd64//amd64/tsc.c  23 Feb 2021 04:44:30 -0000      1.23
>> >>> +++ sys/arch/amd64//amd64/tsc.c  29 Mar 2021 04:18:31 -0000
>> >>> @@ -38,6 +38,7 @@ int            tsc_is_invariant;
>> >>>     #define      TSC_DRIFT_MAX                   250
>> >>>   #define TSC_SKEW_MAX                   100
>> >>> +#define TSC_SYNC_ROUNDS                 1000
>> >>>   int64_t        tsc_drift_observed;
>> >>>     volatile int64_t     tsc_sync_val;
>> >>> @@ -235,6 +236,7 @@ tsc_timecounter_init(struct cpu_info *ci
>> >>>                  printf("%s: disabling user TSC (skew=%lld)\n",
>> >>>                      ci->ci_dev->dv_xname, (long long)ci->ci_tsc_skew);
>> >>>                  tsc_timecounter.tc_user = 0;
>> >>> +                tsc_timecounter.tc_quality = -1000;
>> >>>          }
>> >>>          if (!(ci->ci_flags & CPUF_PRIMARY) ||
>> >>> @@ -314,13 +316,19 @@ tsc_read_bp(struct cpu_info *ci, uint64_
>> >>>   void
>> >>>   tsc_sync_bp(struct cpu_info *ci)
>> >>>   {
>> >>> +        int i, val, diff;
>> >>>          uint64_t bptsc, aptsc;
>> >>>   -      tsc_read_bp(ci, &bptsc, &aptsc); /* discarded - cache effects */
>> >>> -        tsc_read_bp(ci, &bptsc, &aptsc);
>> >>> +        val = INT_MAX;
>> >>> +        for (i = 0; i < TSC_SYNC_ROUNDS; i++) {
>> >>> +                tsc_read_bp(ci, &bptsc, &aptsc);
>> >>> +                diff = bptsc - aptsc;
>> >>> +                if (abs(diff) < abs(val))
>> >>> +                        val = diff;
>> >>> +        }
>> >>>          /* Compute final value to adjust for skew. */
>> >>> -        ci->ci_tsc_skew = bptsc - aptsc;
>> >>> +        ci->ci_tsc_skew = val;
>> >>>   }
>> >>>     /*
>> >>> @@ -351,8 +359,10 @@ tsc_post_ap(struct cpu_info *ci)
>> >>>   void
>> >>>   tsc_sync_ap(struct cpu_info *ci)
>> >>>   {
>> >>> -        tsc_post_ap(ci);
>> >>> -        tsc_post_ap(ci);
>> >>> +        int i;
>> >>> +
>> >>> +        for (i = 0; i < TSC_SYNC_ROUNDS; i++)
>> >>> +                tsc_post_ap(ci);
>> >>>   }
>> >>>     void
>> 

Reply via email to