On Thu, Jul 16, 2020 at 02:14:45PM +0200, Christian Weisgerber wrote: > Scott Cheloha: > > > Can we add the missing LFENCE instructions to userspace and the > > kernel? And can we excise the upper 32 bits? > > > + uint32_t lo; > > + > > + asm volatile("lfence"); > > + asm volatile("rdtsc" : "=a"(lo)); > > That's wrong. rtdsc will clobber %rdx, whether you use that value > or not. You need a corresponding constraint: > > asm volatile("rdtsc" : "=a"(lo) : : "rdx");
Whoops. Thanks! Updated below. On Thu, Jul 16, 2020 at 02:26:56PM +0200, Mark Kettenis wrote: > > Date: Wed, 15 Jul 2020 20:36:04 -0500 > > From: Scott Cheloha <scottchel...@gmail.com> > > > > [...] > > > > My understanding is that you need an "LFENCE sandwich" to prevent the > > RDTSC from being reordered in the pipeline. > > Hmm, it is my understanding that only the lfence before is needed, > that is why rdtsc_lfence() looks the way it looks. If that isn't the > case, that function should probably be adapted. My reasoning is based on the ISA reference: https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.html In particular (p. 1197; Vol. 2B 4-545): > The RDTSC instruction is not a serializing instruction. > > It does not necessarily wait until all previous instructions > have been executed before reading the counter. > > Similarly, subsequent instructions may begin execution before the > read operation is performed. > > If software requires RDTSC to be executed only after all previous > instructions have completed locally, it can either use RDTSCP (if > the processor supports that instruction) or execute the sequence > LFENCE;RDTSC. Note the third sentence. Given that, I reason that a serializing instruction before *and* after the RDTSC should freeze it in place. Whether or not there are diminishing returns on the 2nd LFENCE is beyond what I can comment on. However, if you want a guarantee that it won't move around at all I think you need both LFENCEs. Updated patch adds the second LFENCE to rdtsc_lfence() in cpufunc.h. Index: lib/libc/arch/amd64/gen/usertc.c =================================================================== RCS file: /cvs/src/lib/libc/arch/amd64/gen/usertc.c,v retrieving revision 1.2 diff -u -p -r1.2 usertc.c --- lib/libc/arch/amd64/gen/usertc.c 8 Jul 2020 09:17:48 -0000 1.2 +++ lib/libc/arch/amd64/gen/usertc.c 16 Jul 2020 17:21:28 -0000 @@ -21,9 +21,13 @@ static inline u_int rdtsc(void) { - uint32_t hi, lo; - asm volatile("rdtsc" : "=a"(lo), "=d"(hi)); - return ((uint64_t)lo)|(((uint64_t)hi)<<32); + uint32_t lo; + + asm volatile("lfence"); + asm volatile("rdtsc" : "=a"(lo) : : "rdx"); + asm volatile("lfence"); + + return lo; } static int Index: sys/arch/amd64/include/cpufunc.h =================================================================== RCS file: /cvs/src/sys/arch/amd64/include/cpufunc.h,v retrieving revision 1.35 diff -u -p -r1.35 cpufunc.h --- sys/arch/amd64/include/cpufunc.h 3 Jul 2020 17:54:27 -0000 1.35 +++ sys/arch/amd64/include/cpufunc.h 16 Jul 2020 17:21:28 -0000 @@ -296,7 +296,9 @@ rdtsc_lfence(void) { uint32_t hi, lo; - __asm volatile("lfence; rdtsc" : "=d" (hi), "=a" (lo)); + __asm volatile("lfence"); + __asm volatile("rdtsc" : "=d" (hi), "=a" (lo)); + __asm volatile("lfence"); return (((uint64_t)hi << 32) | (uint64_t) lo); } Index: sys/arch/amd64/amd64/tsc.c =================================================================== RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v retrieving revision 1.19 diff -u -p -r1.19 tsc.c --- sys/arch/amd64/amd64/tsc.c 6 Jul 2020 13:33:06 -0000 1.19 +++ sys/arch/amd64/amd64/tsc.c 16 Jul 2020 17:21:28 -0000 @@ -211,7 +211,13 @@ cpu_recalibrate_tsc(struct timecounter * u_int tsc_get_timecount(struct timecounter *tc) { - return rdtsc() + curcpu()->ci_tsc_skew; + uint32_t lo; + + asm volatile("lfence"); + asm volatile("rdtsc" : "=a"(lo) : : "rdx"); + asm volatile("lfence"); + + return lo + curcpu()->ci_tsc_skew; } void