Re: [PATCH] reduce inlined x86 memcpy by 2 bytes

2005-03-21 Thread Denis Vlasenko
On Sunday 20 March 2005 15:17, Adrian Bunk wrote: > Hi Denis, > > what do your benchmarks say about replacing the whole assembler code > with a > > #define __memcpy __builtin_memcpy It generates call to out-of-line memcpy() if count is non-constant. # cat t.c extern char *a, *b; extern int n

Re: [PATCH] reduce inlined x86 memcpy by 2 bytes

2005-03-20 Thread Adrian Bunk
Hi Denis, what do your benchmarks say about replacing the whole assembler code with a #define __memcpy __builtin_memcpy ? cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a prom

Re: [PATCH] reduce inlined x86 memcpy by 2 bytes

2005-03-18 Thread Denis Vlasenko
On Friday 18 March 2005 11:21, Denis Vlasenko wrote: > This memcpy() is 2 bytes shorter than one currently in mainline > and it have one branch less. It is also 3-4% faster in microbenchmarks > on small blocks if block size is multiple of 4. Mainline is slower > because it has to branch twice per m

[PATCH] reduce inlined x86 memcpy by 2 bytes

2005-03-18 Thread Denis Vlasenko
This memcpy() is 2 bytes shorter than one currently in mainline and it have one branch less. It is also 3-4% faster in microbenchmarks on small blocks if block size is multiple of 4. Mainline is slower because it has to branch twice per memcpy, both mispredicted (but branch prediction hides that in