On Sat, 27 Jun 2020 01:27:14 +0200 Christian Weisgerber <na...@mips.inka.de> wrote:
> I'm also intrigued by this aside in the PowerPC ISA documentation: > | Moreover, Load with Update instructions may take longer to execute > | in some implementations than the corresponding pair of a non-update > | Load instruction and an Add instruction. > What does clang generate? clang likes load/store with update instructions. For example, the powerpc64 kernel has /sys/lib/libkern/memcpy.c, which copies bytes: while (n-- > 0) *t++ = *f++; clang uses lbzu and stbu: memcpy: cmpldi r5,0x0 memcpy+0x4: beqlr memcpy+0x8: addi r4,r4,-1 memcpy+0xc: addi r6,r3,-1 memcpy+0x10: mtspr ctr,r5 memcpy+0x14: lbzu r5,1(r4) memcpy+0x18: stbu r5,1(r6) memcpy+0x1c: bdnz 0x26cd0d4 (memcpy+0x14) memcpy+0x20: blr > I think we should consider dropping this "optimized" memmove.S on > both powerpc and powerpc64. I might want to benchmark memmove.S against memmove.c to check if those unaligned accesses are too slow. First I would have to write a benchmark.