On Sat, 27 Jun 2020 01:27:14 +0200
Christian Weisgerber <na...@mips.inka.de> wrote:

> I'm also intrigued by this aside in the PowerPC ISA documentation:
> | Moreover, Load with Update instructions may take longer to execute
> | in some implementations than the corresponding pair of a non-update
> | Load instruction and an Add instruction.
> What does clang generate?

clang likes load/store with update instructions.  For example, the
powerpc64 kernel has /sys/lib/libkern/memcpy.c, which copies bytes:

        while (n-- > 0)
                *t++ = *f++;

clang uses lbzu and stbu:

memcpy: cmpldi r5,0x0
memcpy+0x4:     beqlr
memcpy+0x8:     addi r4,r4,-1
memcpy+0xc:     addi r6,r3,-1
memcpy+0x10:    mtspr ctr,r5
memcpy+0x14:    lbzu r5,1(r4)
memcpy+0x18:    stbu r5,1(r6)
memcpy+0x1c:    bdnz 0x26cd0d4 (memcpy+0x14)
memcpy+0x20:    blr

> I think we should consider dropping this "optimized" memmove.S on
> both powerpc and powerpc64.

I might want to benchmark memmove.S against memmove.c to check if
those unaligned accesses are too slow.  First I would have to write
a benchmark.

Reply via email to