On Mon, 5 Apr 2021 14:24:03 +0200 (CEST) Mark Kettenis <mark.kette...@xs4all.nl> wrote: >> Date: Mon, 05 Apr 2021 19:14:49 +0900 (JST) >> From: YASUOKA Masahiko <yasu...@openbsd.org> >> >> Hi, >> >> On Mon, 5 Apr 2021 10:43:00 +0300 >> Paul Irofti <p...@irofti.net> wrote: >> > On 05.04.2021 06:13, Scott Cheloha wrote: >> >> On Mon, Mar 29, 2021 at 02:00:01PM +0900, YASUOKA Masahiko wrote: >> >>> On Thu, 25 Mar 2021 19:41:35 +0100 (CET) >> >>> Mark Kettenis <mark.kette...@xs4all.nl> wrote: >> >>>>> From: Scott Cheloha <scottchel...@gmail.com> >> >>>>> Date: Thu, 25 Mar 2021 13:18:04 -0500 >> >>>>>> On Wed, Mar 24, 2021 at 05:40:21PM +0900, YASUOKA Masahiko wrote: >> >>>>> Which diff did you apply? Yasuoka provided two diffs. >> >>>>> >> >>>>> In any case, ignore this diff: >> >>>>> >> >>>>>> diff --git a/sys/arch/amd64/amd64/tsc.c b/sys/arch/amd64/amd64/tsc.c >> >>>>>> index 238a5a068e1..3b951a8b5a3 100644 >> >>>>>> --- a/sys/arch/amd64/amd64/tsc.c >> >>>>>> +++ b/sys/arch/amd64/amd64/tsc.c >> >>>>>> @@ -212,7 +212,8 @@ cpu_recalibrate_tsc(struct timecounter *tc) >> >>>>>> u_int >> >>>>>> tsc_get_timecount(struct timecounter *tc) >> >>>>>> { >> >>>>>> - return rdtsc_lfence() + curcpu()->ci_tsc_skew; >> >>>>>> + //return rdtsc_lfence() + curcpu()->ci_tsc_skew; >> >>>>>> + return rdtsc_lfence(); >> >>>>>> } >> >>>>>> >> >>>>>> void >> >>>>> >> >>>>> >> >>>>> We don't want to discard the skews, that's wrong. >> >>> >> >>> I'm sorry for the confusion. >> >> No problem. >> >> >> >>>>> The reason it "fixes" Yasuoka's problem is because the real skews >> >>>>> on the ESXi VMs in question are probably close to zero but our >> >>>>> synchronization algorithm is picking huge (wrong) skews due to >> >>>>> some other variable interfering with our measurement. >> >>>> >> >>>> Right. If a VM exit happens while we're doing our measurement, you'll >> >>>> see a significant delay. And a guest OS can't prevent those from >> >>>> happening. But even on real hardware SMM mode may interfere with our >> >>>> measurement. >> >>> >> >>> For machines like the ESXi VMs, the measurement seems to have to >> >>> exclude such delayed values as outliers. I think taking a lot of >> >>> samples and choice the minimum is a good enough way for the purpose. >> >>> >> >>> I updated the diff. >> >>> >> >>> - delete lines for debug >> >>> - make tsc quality lower if skew is not good enough >> >>> - reduce difference from NetBSD >> >>> >> >>> comment? ok? >> >> If more iterations fixes your problem, great. It isn't going to make >> >> things worse for machines with sync'd TSCs, makes the TSC usable on >> >> another class of machine, and is relatively cheap. >> >> This is ok cheloha@. >> >> You need another ok, though. >> > >> > >> > The diff is obviously fine. But it is still a heuristic with no real >> > motivation except for this particular ESXi VM case. So my question >> > about why we choose the minimum instead of the median or the mean has >> > not been answered. >> >> Because median or mean is affected by outliers. We actually see >> some outliers in samples from the VMware. >> >> I suppose there is a better mesure, but I am currently no idia and had >> not used that kind of measure in kernel. On the other hand, finding >> the minimum is very simple. > > Using the median should take care of the outliers though.
You are right. I misunderstood the meaning. > I'm not at all convinced that taking the absolute value of the > difference makes sense. It probably works in this case since the > actual skew on your VM is zero. So measurements close to zero are > "good". But what if the skew isn't zero? Take for example an AP that > is running ahead of the BP by 5000 ticks. In that case, the right > value for the skew is -5000. But now imagine that the BP gets > "interrupted" while doing a measurement, resulting in a delay of 10000 > ticks between the two rdtsc_lfence() calls. That would result in a > measured skew of around zero. And by taking the minimum of the > absolute value, you end up using that value. Indeed. Is taking the median or the mode better for the cases? >> > Another issue that I see is that people have not reported, at least >> > publicly, that this runs fine on their normal OpenBSD machines. >> >> Some dmesgs posted on public lists seems to have the same problem. >> >> https://marc.info/?l=openbsd-bugs&w=2&r=1&s=disabling+user+TSC&q=b >> https://marc.info/?l=openbsd-tech&w=2&r=1&s=disabling+user+TSC&q=b >> https://marc.info/?l=openbsd-ports&w=2&r=1&s=disabling+user+TSC&q=b >> >> For example, >> >> https://marc.info/?l=openbsd-bugs&m=161618496905444&w=2 >> >> |Subject: wg(4) crash >> |From: Stuart Henderson <stu () spacehopper ! org> >> |bios0: vendor Dell Inc. version "2.9.0" date 12/06/2019 >> |bios0: Dell Inc. PowerEdge R620 >> |cpu1: disabling user TSC (skew=135) >> |cpu1: smt 0, core 0, package 1 >> >> https://marc.info/?l=openbsd-ports&m=161306073708427&w=2 >> |Subject: Re: sysutils/nut README APC over USB device chgrp/chmod >> |From: Marcus MERIGHI <mcmer-openbsd () tor ! at> >> |bios0: vendor American Megatrends Inc. version "3.1" date 06/07/2018 >> |cpu11: disabling user TSC (skew=240) >> |cpu11: smt 0, core 3, package 1 >> >> these 2 are real machine and using 2 CPU sockets. >> >> https://marc.info/?l=openbsd-ports&m=161562278114172&w=2 >> |Subject: ruby27 vs Puppet >> |From: Giovanni Bechis <giovanni () paclan ! it> >> |bios0: vendor Phoenix Technologies LTD version "6.00" date 12/12/2018 >> |bios0: VMware, Inc. VMware Virtual Platform >> |cpu1: disabling user TSC (skew=-12705) >> >> VMware. seems the same problem of mine. >> >> I'll ask people to do the same test which cheloha@ write in previous >> mail. >> >> > If you think you have enough tests and you want to go ahead with the >> > current heuristic, then the diff is obviously fine. OK pirofti@ >> > >> >> >> >>> Index: sys/arch/amd64//amd64/tsc.c >> >>> =================================================================== >> >>> RCS file: /disk/cvs/openbsd/src/sys/arch/amd64/amd64/tsc.c,v >> >>> retrieving revision 1.23 >> >>> diff -u -p -r1.23 tsc.c >> >>> --- sys/arch/amd64//amd64/tsc.c 23 Feb 2021 04:44:30 -0000 1.23 >> >>> +++ sys/arch/amd64//amd64/tsc.c 29 Mar 2021 04:18:31 -0000 >> >>> @@ -38,6 +38,7 @@ int tsc_is_invariant; >> >>> #define TSC_DRIFT_MAX 250 >> >>> #define TSC_SKEW_MAX 100 >> >>> +#define TSC_SYNC_ROUNDS 1000 >> >>> int64_t tsc_drift_observed; >> >>> volatile int64_t tsc_sync_val; >> >>> @@ -235,6 +236,7 @@ tsc_timecounter_init(struct cpu_info *ci >> >>> printf("%s: disabling user TSC (skew=%lld)\n", >> >>> ci->ci_dev->dv_xname, (long long)ci->ci_tsc_skew); >> >>> tsc_timecounter.tc_user = 0; >> >>> + tsc_timecounter.tc_quality = -1000; >> >>> } >> >>> if (!(ci->ci_flags & CPUF_PRIMARY) || >> >>> @@ -314,13 +316,19 @@ tsc_read_bp(struct cpu_info *ci, uint64_ >> >>> void >> >>> tsc_sync_bp(struct cpu_info *ci) >> >>> { >> >>> + int i, val, diff; >> >>> uint64_t bptsc, aptsc; >> >>> - tsc_read_bp(ci, &bptsc, &aptsc); /* discarded - cache effects */ >> >>> - tsc_read_bp(ci, &bptsc, &aptsc); >> >>> + val = INT_MAX; >> >>> + for (i = 0; i < TSC_SYNC_ROUNDS; i++) { >> >>> + tsc_read_bp(ci, &bptsc, &aptsc); >> >>> + diff = bptsc - aptsc; >> >>> + if (abs(diff) < abs(val)) >> >>> + val = diff; >> >>> + } >> >>> /* Compute final value to adjust for skew. */ >> >>> - ci->ci_tsc_skew = bptsc - aptsc; >> >>> + ci->ci_tsc_skew = val; >> >>> } >> >>> /* >> >>> @@ -351,8 +359,10 @@ tsc_post_ap(struct cpu_info *ci) >> >>> void >> >>> tsc_sync_ap(struct cpu_info *ci) >> >>> { >> >>> - tsc_post_ap(ci); >> >>> - tsc_post_ap(ci); >> >>> + int i; >> >>> + >> >>> + for (i = 0; i < TSC_SYNC_ROUNDS; i++) >> >>> + tsc_post_ap(ci); >> >>> } >> >>> void >>