On Fri, 2021-09-03 at 10:33 +0200, Philippe Gerum wrote: > Bezdeka, Florian <[email protected]> writes: > > > Hi all, > > > > I'm able to reproduce the following on two different platforms now, so > > I assume it's a IRQ_PIPELINE generic issue: > > > > Platform A): > > Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz > > 1 Socket, 4 Cores, 1 thread per core > > > > Platform B): > > Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz > > 2 Sockets, 6 cores per socket, 2 threads per core > > (2 NUMA nodes) > > > > > > Platform A) reports the TSC being unstable during the boot phase, > > platform B) reports the TSC as unstable when running stress tests: > > > > Taken from a B) based system: > > > > [57615.671114] clocksource: timekeeping watchdog on CPU17: Marking > > clocksource 'tsc' as unstable because the skew is too large: > > [57615.738269] clocksource: 'hpet' wd_now: 12f85ed0 > > wd_last: 2c5eab7b mask: ffffffff > > [57615.794489] clocksource: 'tsc' cs_now: > > 68e299c3708c cs_last: 6864c6ea3970 mask: ffffffffffffffff > > [57615.858552] tsc: Marking TSC unstable due to clocksource watchdog > > [57615.858582] TSC found unstable after boot, most likely due to broken > > BIOS. Use 'tsc=unstable'. > > [57615.910138] sched_clock: Marking unstable (57615104375773, > > 749891156)<-(57616072553488, -213973554) > > [57615.905983] clocksource: Checking clocksource tsc synchronization from > > CPU 15. > > [57615.949626] clocksource: Override clocksource tsc is unstable and not > > HRT compatible - cannot switch while in HRT/NOHZ mode > > [57616.016343] clocksource: Switched to clocksource hpet > > > > The clocksource watchdog is migrated between CPUs to make sure the TSC > > is synchronized between cores. For me it looks like a late delivery of > > the watchdog timer. > > > > Available workaround(s): > > - Add "tsc=reliable" to the kernel cmdline args > > - At least for A) based systems it helped to apply the following diff to > > the kernel > > configuration. I do not consider that as "solution" for now. > > > > -CONFIG_HZ_100=y > > +CONFIG_HZ_1000=y > > > > > > As soon as I disable CONFIG_IRQ_PIPELINE the problem is gone. > > > > I already tried testing with CONFIG_DEBUG_IRQ_PIPELINE enabled, but > > that didn't help so far. > > > > Any advise how to debug that? > > > > Best regards, > > Florian > > Could this be related [1] (HPET stanza)? > > [1] https://evlproject.org/core/caveat/#x86-caveat >
In the A) scenario it could be related. HPET is disabled / not available there. Thanks for the hint! With B) we have HPET enabled and it never happend when IRQ_PIPELINE was not compiled in.
