On Thu, 25 Mar 2021 19:41:35 +0100 (CET)
Mark Kettenis <mark.kette...@xs4all.nl> wrote:
>> From: Scott Cheloha <scottchel...@gmail.com>
>> Date: Thu, 25 Mar 2021 13:18:04 -0500
>> > On Wed, Mar 24, 2021 at 05:40:21PM +0900, YASUOKA Masahiko wrote:
>> Which diff did you apply?  Yasuoka provided two diffs.
>> 
>> In any case, ignore this diff:
>> 
>> > diff --git a/sys/arch/amd64/amd64/tsc.c b/sys/arch/amd64/amd64/tsc.c
>> > index 238a5a068e1..3b951a8b5a3 100644
>> > --- a/sys/arch/amd64/amd64/tsc.c
>> > +++ b/sys/arch/amd64/amd64/tsc.c
>> > @@ -212,7 +212,8 @@ cpu_recalibrate_tsc(struct timecounter *tc)
>> > u_int
>> > tsc_get_timecount(struct timecounter *tc)
>> > {
>> > -  return rdtsc_lfence() + curcpu()->ci_tsc_skew;
>> > +  //return rdtsc_lfence() + curcpu()->ci_tsc_skew;
>> > +  return rdtsc_lfence();
>> > }
>> > 
>> > void
>> 
>> 
>> We don't want to discard the skews, that's wrong.

I'm sorry for the confusion.

>> The reason it "fixes" Yasuoka's problem is because the real skews
>> on the ESXi VMs in question are probably close to zero but our
>> synchronization algorithm is picking huge (wrong) skews due to
>> some other variable interfering with our measurement.
> 
> Right.  If a VM exit happens while we're doing our measurement, you'll
> see a significant delay.  And a guest OS can't prevent those from
> happening.  But even on real hardware SMM mode may interfere with our
> measurement.

For machines like the ESXi VMs, the measurement seems to have to
exclude such delayed values as outliers.  I think taking a lot of
samples and choice the minimum is a good enough way for the purpose.

I updated the diff.

- delete lines for debug
- make tsc quality lower if skew is not good enough
- reduce difference from NetBSD

comment? ok?

Index: sys/arch/amd64//amd64/tsc.c
===================================================================
RCS file: /disk/cvs/openbsd/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.23
diff -u -p -r1.23 tsc.c
--- sys/arch/amd64//amd64/tsc.c 23 Feb 2021 04:44:30 -0000      1.23
+++ sys/arch/amd64//amd64/tsc.c 29 Mar 2021 04:18:31 -0000
@@ -38,6 +38,7 @@ int           tsc_is_invariant;
 
 #define        TSC_DRIFT_MAX                   250
 #define TSC_SKEW_MAX                   100
+#define        TSC_SYNC_ROUNDS                 1000
 int64_t        tsc_drift_observed;
 
 volatile int64_t       tsc_sync_val;
@@ -235,6 +236,7 @@ tsc_timecounter_init(struct cpu_info *ci
                printf("%s: disabling user TSC (skew=%lld)\n",
                    ci->ci_dev->dv_xname, (long long)ci->ci_tsc_skew);
                tsc_timecounter.tc_user = 0;
+               tsc_timecounter.tc_quality = -1000;
        }
 
        if (!(ci->ci_flags & CPUF_PRIMARY) ||
@@ -314,13 +316,19 @@ tsc_read_bp(struct cpu_info *ci, uint64_
 void
 tsc_sync_bp(struct cpu_info *ci)
 {
+       int i, val, diff;
        uint64_t bptsc, aptsc;
 
-       tsc_read_bp(ci, &bptsc, &aptsc); /* discarded - cache effects */
-       tsc_read_bp(ci, &bptsc, &aptsc);
+       val = INT_MAX;
+       for (i = 0; i < TSC_SYNC_ROUNDS; i++) {
+               tsc_read_bp(ci, &bptsc, &aptsc);
+               diff = bptsc - aptsc;
+               if (abs(diff) < abs(val))
+                       val = diff;
+       }
 
        /* Compute final value to adjust for skew. */
-       ci->ci_tsc_skew = bptsc - aptsc;
+       ci->ci_tsc_skew = val;
 }
 
 /*
@@ -351,8 +359,10 @@ tsc_post_ap(struct cpu_info *ci)
 void
 tsc_sync_ap(struct cpu_info *ci)
 {
-       tsc_post_ap(ci);
-       tsc_post_ap(ci);
+       int i;
+
+       for (i = 0; i < TSC_SYNC_ROUNDS; i++)
+               tsc_post_ap(ci);
 }
 
 void

Reply via email to