CVSROOT: /cvs Module name: src Changes by: chel...@cvs.openbsd.org 2022/08/11 20:20:37
Modified files: sys/arch/amd64/amd64: tsc.c cpu.c sys/arch/amd64/include: cpu.h cpuvar.h sys/sys : timetc.h sys/kern : kern_tc.c Log message: amd64: simplify TSC synchronization testing Computing a per-CPU TSC skew value is error-prone, especially on multisocket machines and VMs. My best guess is that larger latencies appear to the current skew measurement test as TSC desync, and so the TSC is demoted to a kernel timecounter on these machines or marked non-monotonic. This patch eliminates per-CPU TSC skew values. Instead of trying to measure and correct for TSC desync we only try to detect desync, which is less error-prone. This approach should allow a wider variety of machines to use the TSC as a timecounter when running OpenBSD. In the new sync test, both CPUs repeatedly try to detect whether their TSC is trailing the other CPU's TSC. The upside to this approach is that it yields no false positives. The downside to this approach is that it takes more time than the current skew measurement test. Each test round takes 1ms, and we run up to two rounds per CPU, so this patch slows boot down by 2ms per AP. If any CPU fails the sync test, the TSC is marked non-monotonic and a different timecounter is activated. The TC_USER flag remains intact. There is no middle ground where we fall back to only using the TSC in the kernel. Before running the test, we check for the IA32_TSC_ADJUST register and reset it if necessary. This is a trivial way to work around firmware bugs that desync the TSC before we reach the kernel. Unfortunately, at the moment this register appears to only be available on Intel processors. I cannot find an equivalent but differently-named MSR for AMD processors. Because there is no per-CPU skew value, there is also no concept of TSC drift anymore. Miscellaneous notes: - This patch adds a new timecounter utility function, tc_reset_quality(). Used after sync test failure to mark the TSC non-monotonic. - I have left TSC_DEBUG enabled for now. Unsure if we should leave it enabled for release or not. If we disable it we no longer run the sync test after failing it once. Running the test even after failure provides information about the desync on every CPU. - Taking 1ms per test round is fairly conservative. We can experiment with and discuss shorter test rounds. My main goal with a relatively long test round is ensuring VMs actually run the test. It would be bad if a hypervisor interrupted the test for so long that it concealed desync. - The use of two test rounds is mostly a diagnostic tool: it would be very strange if a CPU passed the first round but failed the second. If we ever saw this in the wild it would indicate something odd. - Most of the desync seen in test reports is on Ryzen CPUs. I believe, but cannot prove, that this is due to a widespread firmware bug on AMD motherboards. Hopefully AMD and/or the downstream vendors fix it. - Fixing TSC desync by writing the TSC directly with WRMSR is very difficult. The TSC is a moving target incrementing very quickly and compensating for WRMSR overhead is non-trivial. We can experiment with this, but my confidence is low that we can make it work reliably. Prompted by deraadt@ and kettenis@ in 2021. Shepherded along by deraadt@ throughout. Reprompted by Yuichiro Naito several times. With input from Yuichiro Naito, naddy@, sthen@, dv@, and deraadt@. Tested by florian@, gnezdo@, sthen@, Josh Rickmar, dv@, Mohamed Aslan, Hrvoje Popovski, Yuichiro Naito, semarie@, mlarkin@, asou@, jmatthew@, Renato Aguiar, and Timo Myyra. Patch v1: https://marc.info/?l=openbsd-tech&m=164330092208035&w=2 Patch v2: https://marc.info/?l=openbsd-tech&m=164558519712957&w=2 Patch v3: https://marc.info/?l=openbsd-tech&m=165698681018991&w=2 Patch v4: https://marc.info/?l=openbsd-tech&m=165835507113680&w=2 Patch v5: https://marc.info/?l=openbsd-tech&m=165923705118770&w=2 "just commit it" deraadt@