Re: neon logops

2013-04-26 Thread Torbjorn Granlund
Richard Henderson r...@twiddle.net writes: Building on the copyi that tege committed the other day, use neon for the logical operations too. I committed the 128 bit version to arm/neon, making it become used for all Neon capable processors. I put it there since it is a speedup for A9 as

neon logops

2013-03-08 Thread Richard Henderson
Building on the copyi that tege committed the other day, use neon for the logical operations too. I did both a 128-bit aligned version, $ ./speed-128 -p 10 -C -s 10,50,100,500,1000,5000,1 mpn_and_n mpn_nand_n clock_gettime is 1.000ns accurate overhead 6.00 cycles, precision

Re: neon logops

2013-03-08 Thread Torbjorn Granlund
Richard Henderson r...@twiddle.net writes: Building on the copyi that tege committed the other day, use neon for the logical operations too. I did both a 128-bit aligned version, $ ./speed-128 -p 10 -C -s 10,50,100,500,1000,5000,1 mpn_and_n mpn_nand_n clock_gettime

Re: neon logops

2013-03-08 Thread Richard Henderson
On 2013-03-08 03:46, Torbjorn Granlund wrote: I assume you mean that the destination ptr are naturally aligned, while the source ptrs are 32-bit aligned? Yes. My guess for the jaggyness is that of two src ptrs, you rarely strike a case where they are 256-bit aligned, in particular not when