Re: [FFmpeg-devel] [PATCH] aarch64: h264pred: Optimize the inner loop of existing 8 bit functions

2021-04-12 Thread Lynne
Apr 12, 2021, 10:07 by mar...@martin.st: > Move the loop counter decrement further from the branch instruction, > this hides the latency of the decrement. > > In loops that first load, then store (the horizontal prediction cases), > do the decrement after the load (where the next instruction

[FFmpeg-devel] [PATCH] aarch64: h264pred: Optimize the inner loop of existing 8 bit functions

2021-04-12 Thread Martin Storsjö
Move the loop counter decrement further from the branch instruction, this hides the latency of the decrement. In loops that first load, then store (the horizontal prediction cases), do the decrement after the load (where the next instruction would stall a bit anyway, waiting for the result of the