Re: [Pixman] [PATCH] sse2: Using MMX and SSE 4.1

2012-06-25 Thread Siarhei Siamashka
On Mon, Jun 25, 2012 at 7:45 PM, Matt Turner matts...@gmail.com wrote: On Mon, Jun 25, 2012 at 1:00 AM, Siarhei Siamashka siarhei.siamas...@gmail.com wrote: OK, I got 7-bit variant of SSE2 bilinear scaling working. It shows quite a good speed boost thanks to PMADDWD instruction, which can be

Re: [Pixman] [PATCH] sse2: Using MMX and SSE 4.1

2012-06-24 Thread Siarhei Siamashka
On Mon, Jun 18, 2012 at 9:09 PM, Søren Sandmann sandm...@cs.au.dk wrote: Siarhei Siamashka siarhei.siamas...@gmail.com writes: This is also a very useful test, but it effectively requires to have an alternative double precision implementation for all the pixman functionality to be verified.

Re: [Pixman] [PATCH] sse2: Using MMX and SSE 4.1

2012-06-18 Thread Søren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes: This is also a very useful test, but it effectively requires to have an alternative double precision implementation for all the pixman functionality to be verified. For bilinear scaling it means that at least various types of repeats need

Re: [Pixman] [PATCH] sse2: Using MMX and SSE 4.1

2012-06-17 Thread Siarhei Siamashka
On Sun, Jun 17, 2012 at 8:27 AM, Bill Spitzak spit...@gmail.com wrote: On 06/16/2012 07:08 AM, Siarhei Siamashka wrote: An alternative idea is instead of changing the algorithm across the board, we could stop requiring bit exact results. The main piece of work here is to change the test suite

Re: [Pixman] [PATCH] sse2: Using MMX and SSE 4.1

2012-06-16 Thread Siarhei Siamashka
On Fri, Jun 15, 2012 at 10:51 PM, Søren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: Also, are we planning to change the bilinear scaling algorithm for 0.28 so that we can use pmaddubsw? I wouldn't object to a patch that dropped precision to 7 bits for all

Re: [Pixman] [PATCH] sse2: Using MMX and SSE 4.1

2012-06-16 Thread Bill Spitzak
On 06/16/2012 07:08 AM, Siarhei Siamashka wrote: An alternative idea is instead of changing the algorithm across the board, we could stop requiring bit exact results. The main piece of work here is to change the test suite so that it will accept pixels up to some maximum relative error. There

Re: [Pixman] [PATCH] sse2: Using MMX and SSE 4.1

2012-06-15 Thread Søren Sandmann
Matt Turner matts...@gmail.com writes: The registers -- yes. The 8-byte aligned loads and stores I'm not sure. Can you do 8-byte aligned loads and stores to/from SSE registers? I believe movq can use SSE registers. Indeed, runtime generation would be great. Something like LLVM or orc would

Re: [Pixman] [PATCH] sse2: Using MMX and SSE 4.1

2012-06-14 Thread Matt Turner
Sorry it's taken so long to get back to this. On Wed, May 9, 2012 at 12:57 PM, Søren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: I still think MMX has no use on modern systems. The SSE2 implementation used to have such MMX loops, but they were removed in

Re: [Pixman] [PATCH] sse2: Using MMX and SSE 4.1

2012-05-09 Thread Søren Sandmann
Matt Turner matts...@gmail.com writes: I started porting my src__0565 MMX function to SSE2, and in the process started thinking about using SSE3+. The useful instructions added post SSE2 that I see are SSE3: lddqu - for unaligned loads across cache lines I don't really understand

Re: [Pixman] [PATCH] sse2: Using MMX and SSE 4.1

2012-05-09 Thread Jeff Muizelaar
On 2012-05-09, at 12:57 PM, Søren Sandmann wrote: Matt Turner matts...@gmail.com writes: I started porting my src__0565 MMX function to SSE2, and in the process started thinking about using SSE3+. The useful instructions added post SSE2 that I see are SSE3: lddqu - for

[Pixman] [PATCH] sse2: Using MMX and SSE 4.1

2012-05-02 Thread Matt Turner
I started porting my src__0565 MMX function to SSE2, and in the process started thinking about using SSE3+. The useful instructions added post SSE2 that I see are SSE3: lddqu - for unaligned loads across cache lines SSSE3: palignr - for unaligned loads (but requires software