On Thu, Jul 16, 2020 at 02:14:45PM +0200, Christian Weisgerber wrote:
> Scott Cheloha:
> 
> > Can we add the missing LFENCE instructions to userspace and the
> > kernel?  And can we excise the upper 32 bits?
> 
> > +   uint32_t lo;
> > +
> > +   asm volatile("lfence");
> > +   asm volatile("rdtsc" : "=a"(lo));
> 
> That's wrong.  rtdsc will clobber %rdx, whether you use that value
> or not.  You need a corresponding constraint:
> 
>   asm volatile("rdtsc" : "=a"(lo) : : "rdx");

Whoops.  Thanks!  Updated below.

On Thu, Jul 16, 2020 at 02:26:56PM +0200, Mark Kettenis wrote:
> > Date: Wed, 15 Jul 2020 20:36:04 -0500
> > From: Scott Cheloha <scottchel...@gmail.com>
> > 
> > [...]
> > 
> > My understanding is that you need an "LFENCE sandwich" to prevent the
> > RDTSC from being reordered in the pipeline.
> 
> Hmm, it is my understanding that only the lfence before is needed,
> that is why rdtsc_lfence() looks the way it looks.  If that isn't the
> case, that function should probably be adapted.

My reasoning is based on the ISA reference:

https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.html

In particular (p. 1197; Vol. 2B 4-545):

> The RDTSC instruction is not a serializing instruction.
>
> It does not necessarily wait until all previous instructions
> have been executed before reading the counter.
> 
> Similarly, subsequent instructions may begin execution before the
> read operation is performed.
> 
> If software requires RDTSC to be executed only after all previous
> instructions have completed locally, it can either use RDTSCP (if
> the processor supports that instruction) or execute the sequence
> LFENCE;RDTSC.

Note the third sentence.

Given that, I reason that a serializing instruction before *and* after
the RDTSC should freeze it in place.

Whether or not there are diminishing returns on the 2nd LFENCE is
beyond what I can comment on.  However, if you want a guarantee that
it won't move around at all I think you need both LFENCEs.

Updated patch adds the second LFENCE to rdtsc_lfence() in cpufunc.h.

Index: lib/libc/arch/amd64/gen/usertc.c
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/gen/usertc.c,v
retrieving revision 1.2
diff -u -p -r1.2 usertc.c
--- lib/libc/arch/amd64/gen/usertc.c    8 Jul 2020 09:17:48 -0000       1.2
+++ lib/libc/arch/amd64/gen/usertc.c    16 Jul 2020 17:21:28 -0000
@@ -21,9 +21,13 @@
 static inline u_int
 rdtsc(void)
 {
-       uint32_t hi, lo;
-       asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
-       return ((uint64_t)lo)|(((uint64_t)hi)<<32);
+       uint32_t lo;
+
+       asm volatile("lfence");
+       asm volatile("rdtsc" : "=a"(lo) : : "rdx");
+       asm volatile("lfence");
+
+       return lo;
 }
 
 static int
Index: sys/arch/amd64/include/cpufunc.h
===================================================================
RCS file: /cvs/src/sys/arch/amd64/include/cpufunc.h,v
retrieving revision 1.35
diff -u -p -r1.35 cpufunc.h
--- sys/arch/amd64/include/cpufunc.h    3 Jul 2020 17:54:27 -0000       1.35
+++ sys/arch/amd64/include/cpufunc.h    16 Jul 2020 17:21:28 -0000
@@ -296,7 +296,9 @@ rdtsc_lfence(void)
 {
        uint32_t hi, lo;
 
-       __asm volatile("lfence; rdtsc" : "=d" (hi), "=a" (lo));
+       __asm volatile("lfence");
+       __asm volatile("rdtsc" : "=d" (hi), "=a" (lo));
+       __asm volatile("lfence");
        return (((uint64_t)hi << 32) | (uint64_t) lo);
 }
 
Index: sys/arch/amd64/amd64/tsc.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.19
diff -u -p -r1.19 tsc.c
--- sys/arch/amd64/amd64/tsc.c  6 Jul 2020 13:33:06 -0000       1.19
+++ sys/arch/amd64/amd64/tsc.c  16 Jul 2020 17:21:28 -0000
@@ -211,7 +211,13 @@ cpu_recalibrate_tsc(struct timecounter *
 u_int
 tsc_get_timecount(struct timecounter *tc)
 {
-       return rdtsc() + curcpu()->ci_tsc_skew;
+       uint32_t lo;
+
+       asm volatile("lfence");
+       asm volatile("rdtsc" : "=a"(lo) : : "rdx");
+       asm volatile("lfence");
+
+       return lo + curcpu()->ci_tsc_skew;
 }
 
 void

Reply via email to