On Sat, 27 Jun 2020 01:27:14 +0200
Christian Weisgerber <[email protected]> wrote:
> That function simply copies as many (double)words plus a tail of
> bytes as the length argument specifies. Neither source nor destination
> are checked for alignment, so this will happily run a loop of
> unaligned accesses, which doesn't sound very optimal.
I made a benchmark and concluded that unaligned word copies are slower
than aligned word copies, but faster than byte copies. In most cases,
memmove.S is faster than memmove.c, but if aligned word copies between
unaligned buffers are possible, then memmove.c is faster.
The benchmark was on a 32-bit macppc G3 with
cpu0 at mainbus0: 750 (Revision 0x202): 400 MHz: 512KB backside cache
The benchmark has 4 implementations of memmove,
stbu => byte copy with lbzu,stbu loop
stbx => byte copy with lbzx,stbx,addi loop
C => aligned word copy or byte copy (libc/string/memmove.c)
asm => unaligned word copy (libc/arch/powerpc/string/memmove.S)
It shows time measured by mftb (move from timebase).
1st bench: move 10000 bytes up by 4 bytes, then down by 4 bytes, in
aligned buffer (offset 0). asm wins:
$ ./bench 10000 4 0
stbu stbx C asm
2639 2814 792 633
2502 2814 784 628
2501 2814 783 627
2501 2814 784 626
2nd bench: unaligned buffer (offset 1), but (src & 3) == (dst & 3), so
C does aligned word copies, while asm does misaligned. C wins:
$ ./bench 10000 4 1
stbu stbx C asm
2638 3006 795 961
2502 2814 786 938
2501 2814 786 939
2501 2813 785 939
3rd bench: move up then down by 5 bytes, src & 3 != dst & 3, can't
align word copies. C does byte copies. asm wins:
$ ./bench 10000 5 0
stbu stbx C asm
2675 2815 2514 809
2501 2813 2504 782
2502 2815 2504 782
2501 2814 2503 782
I think that memmove.S is probably better than memmove.c on G3.
I haven't run the bench on POWER9.