On Tue, Jul 05, 2022 at 10:53:43AM -0400, Dave Voutila wrote:
> 
> Scott Cheloha <scottchel...@gmail.com> writes:
> 
> > On Tue, Jul 05, 2022 at 07:15:31AM -0400, Dave Voutila wrote:
> >>
> >> Scott Cheloha <scottchel...@gmail.com> writes:
> >>
> >> > [...]
> >> >
> >> > If you fail the test you will see something like this:
> >> >
> >> >  tsc: cpu0/cpu2: sync test round 1/2 failed
> >> >  tsc: cpu0/cpu2: cpu2: 13043 lags 438 cycles
> >> >
> >> > A printout like this would mean that the sync test for cpu2 failed.
> >> > In particular, cpu2's TSC trails cpu0's TSC by at least 438 cycles.
> >> > If this happens for *any* CPU we mark the TSC timecounter as
> >> > defective.
> >>
> >> I think this passes now on my dual-socket Xeon box?
> >
> > Yes, it passes.  The timecounter on your machine should still have a
> > quality of 2000, i.e. we didn't mark it defective.
> >
> >> Full dmesg at the end of the email[1], but just the `tsc:' lines look
> >> like:
> >>
> >> $ grep tsc dmesg.txt
> >> tsc: cpu0: IA32_TSC_ADJUST: -5774382067215574 -> 0
> >> tsc: cpu1: IA32_TSC_ADJUST: -5774382076335870 -> 0
> >> tsc: cpu2: IA32_TSC_ADJUST: -5774382073829798 -> 0
> >> tsc: cpu3: IA32_TSC_ADJUST: -5774382071913818 -> 0
> >> tsc: cpu4: IA32_TSC_ADJUST: -5774382075956770 -> 0
> >> tsc: cpu5: IA32_TSC_ADJUST: -5774382074583181 -> 0
> >> tsc: cpu6: IA32_TSC_ADJUST: -5774382073199574 -> 0
> >> tsc: cpu7: IA32_TSC_ADJUST: -5774382076500135 -> 0
> >> tsc: cpu8: IA32_TSC_ADJUST: -5774382074705354 -> 0
> >> tsc: cpu9: IA32_TSC_ADJUST: -5774382075954945 -> 0
> >> tsc: cpu10: IA32_TSC_ADJUST: -5774382070567294 -> 0
> >> tsc: cpu11: IA32_TSC_ADJUST: -5774382075968443 -> 0
> >> tsc: cpu12: IA32_TSC_ADJUST: -5774382067353478 -> 0
> >> tsc: cpu13: IA32_TSC_ADJUST: -5774382071926523 -> 0
> >> tsc: cpu14: IA32_TSC_ADJUST: -5774382074619890 -> 0
> >> tsc: cpu15: IA32_TSC_ADJUST: -5774382070107058 -> 0
> >> tsc: cpu16: IA32_TSC_ADJUST: -5774382076196640 -> 0
> >> tsc: cpu17: IA32_TSC_ADJUST: -5774382075090665 -> 0
> >> tsc: cpu18: IA32_TSC_ADJUST: -5774382073529646 -> 0
> >> tsc: cpu19: IA32_TSC_ADJUST: -5774382076443616 -> 0
> >> tsc: cpu20: IA32_TSC_ADJUST: -5774382074994536 -> 0
> >> tsc: cpu21: IA32_TSC_ADJUST: -5774382076309520 -> 0
> >> tsc: cpu22: IA32_TSC_ADJUST: -5774382070947686 -> 0
> >> tsc: cpu23: IA32_TSC_ADJUST: -5774382073056320 -> 0
> >
> > Fascinating.  Wonder what the heck it's doing down there.
> >
> >> It does look like there's a newer BIOS version for this machine, so I'll
> >> try updating it later today and repeating the test to see if anything
> >> changes.
> 
> After a BIOS update, still similar output.
> 
> "new" bios:
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xec0f0 (105 entries)
> bios0: vendor Dell Inc. version "A34" date 10/19/2020
> bios0: Dell Inc. Precision Tower 7810
> 
> $ dmesg | grep tsc
> tsc: cpu0: IA32_TSC_ADJUST: -4070378216 -> 0
> tsc: cpu1: IA32_TSC_ADJUST: -4081094631 -> 0
> tsc: cpu2: IA32_TSC_ADJUST: -4078853396 -> 0
> tsc: cpu3: IA32_TSC_ADJUST: -4074362824 -> 0
> tsc: cpu4: IA32_TSC_ADJUST: -4080872645 -> 0
> tsc: cpu5: IA32_TSC_ADJUST: -4075673830 -> 0
> tsc: cpu6: IA32_TSC_ADJUST: -4081906959 -> 0
> tsc: cpu7: IA32_TSC_ADJUST: -4073006269 -> 0
> tsc: cpu8: IA32_TSC_ADJUST: -4081803214 -> 0
> tsc: cpu9: IA32_TSC_ADJUST: -4081294540 -> 0
> tsc: cpu10: IA32_TSC_ADJUST: -4079817920 -> 0
> tsc: cpu11: IA32_TSC_ADJUST: -4079871039 -> 0
> tsc: cpu12: IA32_TSC_ADJUST: -4070522580 -> 0
> tsc: cpu13: IA32_TSC_ADJUST: -4077205405 -> 0
> tsc: cpu14: IA32_TSC_ADJUST: -4081797309 -> 0
> tsc: cpu15: IA32_TSC_ADJUST: -4078574630 -> 0
> tsc: cpu16: IA32_TSC_ADJUST: -4081539272 -> 0
> tsc: cpu17: IA32_TSC_ADJUST: -4079657247 -> 0
> tsc: cpu18: IA32_TSC_ADJUST: -4080469326 -> 0
> tsc: cpu19: IA32_TSC_ADJUST: -4073404194 -> 0
> tsc: cpu20: IA32_TSC_ADJUST: -4081473720 -> 0
> tsc: cpu21: IA32_TSC_ADJUST: -4076195877 -> 0
> tsc: cpu22: IA32_TSC_ADJUST: -4077876814 -> 0
> tsc: cpu23: IA32_TSC_ADJUST: -4081863303 -> 0
> 
> And still a quality tsc :) :
> 
> $ sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) acpitimer0(1000)

Alrighty, that's "good enough".  We "fixed" it!

Thank you for testing (again).  If this patch is committed your
machine will be able to use the TSC as a timecounter without issue.

Now, if you have the patience, there is another thing you could do:

You could report this to Dell.  They will probably bullshit you and
refuse to consider what you're saying because you aren't running
Windows or even Linux.  But it's worth a shot.

The problem, in brief, is that the IA32_TSC_ADJUST register on every
CPU is non-zero when the kernel boots.  And there isn't any reason for
them to be non-zero.  At least, no reason I can imagine.

The spec says the TSCs should all start from zero simultaneously and
run at the same fixed frequency.  Intentionally desynchronizing them
does not serve any obvious purpose.  It walks, talks, and quacks like
a bug.  We can work around the problem because Intel has graced us
with the IA32_TSC_ADJUST MSR, but it still leaves me wondering what
the fuck the machine is doing before we boot.

-Scott

Reply via email to