[PATCH] T3/T4 sparc shifts, plus more timings

2013-03-25 Thread David Miller
These give a modest speedup compared to the T1 routines. I also added missing T3 timings to existing code. Also, I worked on a copyi/copyd for T3/T4 that uses cache-initializing stores (basically, if you're going to write a full aligned 64-byte cache line, you tell the chip by using a special ASI

Re: [PATCH] 64-bit Popcount/Hweight for T3 and later

2013-03-25 Thread Torbjorn Granlund
David Miller writes: Technically we could use this on some chips we don't distinguish on a fine enough granularity yet. For example we can assume popc is available on T2 as well as UltraSPARC-IV. But for now, just T3 and later. I suppose we should mention this as a comment in the c

Re: T3/T3 mul_2 and addmul_2

2013-03-25 Thread David Miller
From: Torbjorn Granlund Date: Mon, 25 Mar 2013 19:45:27 +0100 > I cannot recall which edits I made between these variants. If only the > checked-in code has fluctuations, then it should be no problem finding > an edit which avoids them. If both variants have fluctuations, then it > will be hard

Re: T3/T3 mul_2 and addmul_2

2013-03-25 Thread Torbjorn Granlund
> If you want to play with this, please start with the checked in code > (you'll need to fresh configure.ac to allow the aormul_2 'multifunc' > name). The first thing to try is its speed compared to the code you > timed above. I'm getting wildly different performance characteristics f