On Tue, Aug 08, 2023 at 08:00:47PM +0200, Mark Kettenis wrote:
> > From: Dale Rahn <[email protected]>
> > Date: Tue, 8 Aug 2023 12:36:45 -0400
> >
> > Switching the computation of cycles/delaycnt to a proper 64 value math
> > instead of the '32 bit safe' complex math is likely a good idea.
> > However I am not completely convinced that switching to 'yield' (current
> > CPU_BUSY_CYCLE implementation) for every loop of a 'short wait' in the wait
> > loop makes sense. In a hypervisor environment, this could cause a very
> > short wait between register writes to become very long, In a non-hypervisor
> > environment there is essentially no improvement because the yield doesn't
> > really have any benefits on non-hypervisor.
>
> Dale, I think you're confused here. There is no arcitectural way to
> trap a YIELD instruction. The docs explicitly state that SMT and SMP
> systems. I suspect that this instruction was primarily introduced for
> SMT systems that never materialized; I completely believe you that on
> current hardware it does nothing.
>
> Even without a YIELD instruction a hypervisor might interrupt us and
> schedule a different vCPU onto the core. And on real hardware
> external interrupts may happen. So you really can't count on delay(9)
> being accurate.
>
> Linux does use YIELD in it delay loop.
Okay good.
> > To my current understanding, there is no useful 'wait short period' on arm
> > cores.
>
> There is WFET (and WFIT), but that is Armv8.7 material, so not
> availble on actual hardware that we run on. Linux uses that
> instruction in its delay loop when available.
These look great. Can't wait for them to be implemented outside of a
functional simulator.
> On machines with a working Generic Timer event stream, Linux uses WFE
> if the delay is long enough.
I have a prototype for this. Obviously it's a separate patch.
Anyway, can I go ahead with the patch below to start?
- Use 64-bit arithmetic to simplify agtimer_delay().
- Use CPU_BUSY_CYCLE() ("yield" on arm64) in the loop to match other
delay(9) implementations.
Index: agtimer.c
===================================================================
RCS file: /cvs/src/sys/arch/arm64/dev/agtimer.c,v
retrieving revision 1.23
diff -u -p -r1.23 agtimer.c
--- agtimer.c 25 Jul 2023 18:16:19 -0000 1.23
+++ agtimer.c 10 Aug 2023 20:59:27 -0000
@@ -323,32 +323,12 @@ agtimer_cpu_initclocks(void)
void
agtimer_delay(u_int usecs)
{
- uint64_t clock, oclock, delta, delaycnt;
- uint64_t csec, usec;
- volatile int j;
+ uint64_t cycles, start;
- if (usecs > (0x80000000 / agtimer_frequency)) {
- csec = usecs / 10000;
- usec = usecs % 10000;
-
- delaycnt = (agtimer_frequency / 100) * csec +
- (agtimer_frequency / 100) * usec / 10000;
- } else {
- delaycnt = agtimer_frequency * usecs / 1000000;
- }
- if (delaycnt <= 1)
- for (j = 100; j > 0; j--)
- ;
-
- oclock = agtimer_readcnt64();
- while (1) {
- for (j = 100; j > 0; j--)
- ;
- clock = agtimer_readcnt64();
- delta = clock - oclock;
- if (delta > delaycnt)
- break;
- }
+ start = agtimer_readcnt64();
+ cycles = (uint64_t)usecs * agtimer_frequency / 1000000;
+ while (agtimer_readcnt64() - start < cycles)
+ CPU_BUSY_CYCLE();
}
void