On Tue, Aug 08, 2023 at 08:00:47PM +0200, Mark Kettenis wrote:
> > From: Dale Rahn <dale.r...@gmail.com>
> > Date: Tue, 8 Aug 2023 12:36:45 -0400
> > 
> > Switching the computation of cycles/delaycnt to a proper 64 value math
> > instead of the '32 bit safe' complex math is likely a good idea.
> > However I am not completely convinced that switching to 'yield' (current
> > CPU_BUSY_CYCLE implementation) for every loop of a 'short wait' in the wait
> > loop makes sense. In a hypervisor environment, this could cause a very
> > short wait between register writes to become very long, In a non-hypervisor
> > environment there is essentially no improvement because the yield doesn't
> > really have any benefits on non-hypervisor.
> 
> Dale, I think you're confused here.  There is no arcitectural way to
> trap a YIELD instruction.  The docs explicitly state that SMT and SMP
> systems.  I suspect that this instruction was primarily introduced for
> SMT systems that never materialized; I completely believe you that on
> current hardware it does nothing.
> 
> Even without a YIELD instruction a hypervisor might interrupt us and
> schedule a different vCPU onto the core.  And on real hardware
> external interrupts may happen.  So you really can't count on delay(9)
> being accurate.
> 
> Linux does use YIELD in it delay loop.

Okay good.

> > To my current understanding, there is no useful 'wait short period' on arm
> > cores.
> 
> There is WFET (and WFIT), but that is Armv8.7 material, so not
> availble on actual hardware that we run on.  Linux uses that
> instruction in its delay loop when available.

These look great.  Can't wait for them to be implemented outside of a
functional simulator.

> On machines with a working Generic Timer event stream, Linux uses WFE
> if the delay is long enough.

I have a prototype for this.  Obviously it's a separate patch.

Anyway, can I go ahead with the patch below to start?

- Use 64-bit arithmetic to simplify agtimer_delay().

- Use CPU_BUSY_CYCLE() ("yield" on arm64) in the loop to match other
  delay(9) implementations.

Index: agtimer.c
===================================================================
RCS file: /cvs/src/sys/arch/arm64/dev/agtimer.c,v
retrieving revision 1.23
diff -u -p -r1.23 agtimer.c
--- agtimer.c   25 Jul 2023 18:16:19 -0000      1.23
+++ agtimer.c   10 Aug 2023 20:59:27 -0000
@@ -323,32 +323,12 @@ agtimer_cpu_initclocks(void)
 void
 agtimer_delay(u_int usecs)
 {
-       uint64_t                clock, oclock, delta, delaycnt;
-       uint64_t                csec, usec;
-       volatile int            j;
+       uint64_t cycles, start;
 
-       if (usecs > (0x80000000 / agtimer_frequency)) {
-               csec = usecs / 10000;
-               usec = usecs % 10000;
-
-               delaycnt = (agtimer_frequency / 100) * csec +
-                   (agtimer_frequency / 100) * usec / 10000;
-       } else {
-               delaycnt = agtimer_frequency * usecs / 1000000;
-       }
-       if (delaycnt <= 1)
-               for (j = 100; j > 0; j--)
-                       ;
-
-       oclock = agtimer_readcnt64();
-       while (1) {
-               for (j = 100; j > 0; j--)
-                       ;
-               clock = agtimer_readcnt64();
-               delta = clock - oclock;
-               if (delta > delaycnt)
-                       break;
-       }
+       start = agtimer_readcnt64();
+       cycles = (uint64_t)usecs * agtimer_frequency / 1000000;
+       while (agtimer_readcnt64() - start < cycles)
+               CPU_BUSY_CYCLE();
 }
 
 void

Reply via email to