Re: powerpc64: 64-bit-ize memmove.S

George Koehler Sat, 27 Jun 2020 21:31:39 -0700

On Sat, 27 Jun 2020 01:27:14 +0200
Christian Weisgerber <na...@mips.inka.de> wrote:


> That function simply copies as many (double)words plus a tail of
> bytes as the length argument specifies.  Neither source nor destination
> are checked for alignment, so this will happily run a loop of
> unaligned accesses, which doesn't sound very optimal.

I made a benchmark and concluded that unaligned word copies are slower
than aligned word copies, but faster than byte copies.  In most cases,
memmove.S is faster than memmove.c, but if aligned word copies between
unaligned buffers are possible, then memmove.c is faster.

The benchmark was on a 32-bit macppc G3 with
cpu0 at mainbus0: 750 (Revision 0x202): 400 MHz: 512KB backside cache

The benchmark has 4 implementations of memmove,
  stbu  =>  byte copy with lbzu,stbu loop
  stbx  =>  byte copy with lbzx,stbx,addi loop
  C     =>  aligned word copy or byte copy (libc/string/memmove.c)
  asm   =>  unaligned word copy (libc/arch/powerpc/string/memmove.S)

It shows time measured by mftb (move from timebase).

1st bench: move 10000 bytes up by 4 bytes, then down by 4 bytes, in
aligned buffer (offset 0).  asm wins:

$ ./bench 10000 4 0
        stbu    stbx    C       asm
        2639    2814    792     633
        2502    2814    784     628
        2501    2814    783     627
        2501    2814    784     626

2nd bench: unaligned buffer (offset 1), but (src & 3) == (dst & 3), so
C does aligned word copies, while asm does misaligned.  C wins:

$ ./bench 10000 4 1
        stbu    stbx    C       asm
        2638    3006    795     961
        2502    2814    786     938
        2501    2814    786     939
        2501    2813    785     939

3rd bench: move up then down by 5 bytes, src & 3 != dst & 3, can't
align word copies.  C does byte copies.  asm wins:

$ ./bench 10000 5 0 
        stbu    stbx    C       asm
        2675    2815    2514    809
        2501    2813    2504    782
        2502    2815    2504    782
        2501    2814    2503    782

I think that memmove.S is probably better than memmove.c on G3.
I haven't run the bench on POWER9.

Re: powerpc64: 64-bit-ize memmove.S

Reply via email to