> From: Dale Rahn <dale.r...@gmail.com> > Date: Tue, 8 Aug 2023 12:36:45 -0400 > > Switching the computation of cycles/delaycnt to a proper 64 value math > instead of the '32 bit safe' complex math is likely a good idea. > However I am not completely convinced that switching to 'yield' (current > CPU_BUSY_CYCLE implementation) for every loop of a 'short wait' in the wait > loop makes sense. In a hypervisor environment, this could cause a very > short wait between register writes to become very long, In a non-hypervisor > environment there is essentially no improvement because the yield doesn't > really have any benefits on non-hypervisor.
Dale, I think you're confused here. There is no arcitectural way to trap a YIELD instruction. The docs explicitly state that SMT and SMP systems. I suspect that this instruction was primarily introduced for SMT systems that never materialized; I completely believe you that on current hardware it does nothing. Even without a YIELD instruction a hypervisor might interrupt us and schedule a different vCPU onto the core. And on real hardware external interrupts may happen. So you really can't count on delay(9) being accurate. Linux does use YIELD in it delay loop. > To my current understanding, there is no useful 'wait short period' on arm > cores. There is WFET (and WFIT), but that is Armv8.7 material, so not availble on actual hardware that we run on. Linux uses that instruction in its delay loop when available. On machines with a working Generic Timer event stream, Linux uses WFE if the delay is long enough. > On Mon, Aug 7, 2023 at 10:32 PM Scott Cheloha <scottchel...@gmail.com> > wrote: > > > The agtimer(4/arm64) delay(9) implementation is quite complicated. > > This patch simplifies it. > > > > Am I missing something here? There's no reason to implement the > > busy-loop like this, right? > > > > ok? > > > > Index: agtimer.c > > =================================================================== > > RCS file: /cvs/src/sys/arch/arm64/dev/agtimer.c,v > > retrieving revision 1.23 > > diff -u -p -r1.23 agtimer.c > > --- agtimer.c 25 Jul 2023 18:16:19 -0000 1.23 > > +++ agtimer.c 8 Aug 2023 02:24:57 -0000 > > @@ -323,32 +323,12 @@ agtimer_cpu_initclocks(void) > > void > > agtimer_delay(u_int usecs) > > { > > - uint64_t clock, oclock, delta, delaycnt; > > - uint64_t csec, usec; > > - volatile int j; > > + uint64_t cycles, start; > > > > - if (usecs > (0x80000000 / agtimer_frequency)) { > > - csec = usecs / 10000; > > - usec = usecs % 10000; > > - > > - delaycnt = (agtimer_frequency / 100) * csec + > > - (agtimer_frequency / 100) * usec / 10000; > > - } else { > > - delaycnt = agtimer_frequency * usecs / 1000000; > > - } > > - if (delaycnt <= 1) > > - for (j = 100; j > 0; j--) > > - ; > > - > > - oclock = agtimer_readcnt64(); > > - while (1) { > > - for (j = 100; j > 0; j--) > > - ; > > - clock = agtimer_readcnt64(); > > - delta = clock - oclock; > > - if (delta > delaycnt) > > - break; > > - } > > + start = agtimer_readcnt64(); > > + cycles = (uint64_t)usecs * agtimer_frequency / 1000000; > > + while (agtimer_readcnt64() - start < cycles) > > + CPU_BUSY_CYCLE(); > > } > > > > void > > > > >