Re: [Pixman] [PATCH] sse2: faster bilinear scaling (pack 4 pixels to write with MOVDQA)

2013-09-05 Thread Siarhei Siamashka
On Thu, 05 Sep 2013 04:42:08 +0200 sandm...@cs.au.dk (Søren Sandmann) wrote: Siarhei Siamashka siarhei.siamas...@gmail.com writes: The loops are already unrolled, so it was just a matter of packing 4 pixels into a single XMM register and doing aligned 128-bit writes to memory via MOVDQA

Re: [Pixman] [PATCH] sse2: faster bilinear scaling (pack 4 pixels to write with MOVDQA)

2013-09-04 Thread Søren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes: The loops are already unrolled, so it was just a matter of packing 4 pixels into a single XMM register and doing aligned 128-bit writes to memory via MOVDQA instructions for the SRC compositing operator fast path. For the other fast paths,

[Pixman] [PATCH] sse2: faster bilinear scaling (pack 4 pixels to write with MOVDQA)

2013-09-02 Thread Siarhei Siamashka
The loops are already unrolled, so it was just a matter of packing 4 pixels into a single XMM register and doing aligned 128-bit writes to memory via MOVDQA instructions for the SRC compositing operator fast path. For the other fast paths, this XMM register is also directly routed to further