Apr 12, 2021, 10:07 by mar...@martin.st:
> Move the loop counter decrement further from the branch instruction,
> this hides the latency of the decrement.
>
> In loops that first load, then store (the horizontal prediction cases),
> do the decrement after the load (where the next instruction
Move the loop counter decrement further from the branch instruction,
this hides the latency of the decrement.
In loops that first load, then store (the horizontal prediction cases),
do the decrement after the load (where the next instruction would
stall a bit anyway, waiting for the result of the