On Mon, Jun 25, 2012 at 7:45 PM, Matt Turner matts...@gmail.com wrote:
On Mon, Jun 25, 2012 at 1:00 AM, Siarhei Siamashka
siarhei.siamas...@gmail.com wrote:
OK, I got 7-bit variant of SSE2 bilinear scaling working. It shows
quite a good speed boost thanks to PMADDWD instruction, which can be
On Mon, Jun 18, 2012 at 9:09 PM, Søren Sandmann sandm...@cs.au.dk wrote:
Siarhei Siamashka siarhei.siamas...@gmail.com writes:
This is also a very useful test, but it effectively requires to have
an alternative double precision implementation for all the pixman
functionality to be verified.
Siarhei Siamashka siarhei.siamas...@gmail.com writes:
This is also a very useful test, but it effectively requires to have
an alternative double precision implementation for all the pixman
functionality to be verified. For bilinear scaling it means that at
least various types of repeats need
On Sun, Jun 17, 2012 at 8:27 AM, Bill Spitzak spit...@gmail.com wrote:
On 06/16/2012 07:08 AM, Siarhei Siamashka wrote:
An alternative idea is instead of changing the algorithm across the
board, we could stop requiring bit exact results. The main piece of work
here is to change the test suite
On Fri, Jun 15, 2012 at 10:51 PM, Søren Sandmann sandm...@cs.au.dk wrote:
Matt Turner matts...@gmail.com writes:
Also, are we planning to change the bilinear scaling algorithm for
0.28 so that we can use pmaddubsw?
I wouldn't object to a patch that dropped precision to 7 bits for all
On 06/16/2012 07:08 AM, Siarhei Siamashka wrote:
An alternative idea is instead of changing the algorithm across the
board, we could stop requiring bit exact results. The main piece of work
here is to change the test suite so that it will accept pixels up to
some maximum relative error. There
Matt Turner matts...@gmail.com writes:
The registers -- yes. The 8-byte aligned loads and stores I'm not
sure. Can you do 8-byte aligned loads and stores to/from SSE
registers?
I believe movq can use SSE registers.
Indeed, runtime generation would be great. Something like LLVM or orc
would
Sorry it's taken so long to get back to this.
On Wed, May 9, 2012 at 12:57 PM, Søren Sandmann sandm...@cs.au.dk wrote:
Matt Turner matts...@gmail.com writes:
I still think MMX has no use on modern systems. The SSE2 implementation
used to have such MMX loops, but they were removed in
Matt Turner matts...@gmail.com writes:
I started porting my src__0565 MMX function to SSE2, and in the
process started thinking about using SSE3+. The useful instructions
added post SSE2 that I see are
SSE3: lddqu - for unaligned loads across cache lines
I don't really understand
On 2012-05-09, at 12:57 PM, Søren Sandmann wrote:
Matt Turner matts...@gmail.com writes:
I started porting my src__0565 MMX function to SSE2, and in the
process started thinking about using SSE3+. The useful instructions
added post SSE2 that I see are
SSE3: lddqu - for
I started porting my src__0565 MMX function to SSE2, and in the
process started thinking about using SSE3+. The useful instructions
added post SSE2 that I see are
SSE3: lddqu - for unaligned loads across cache lines
SSSE3: palignr - for unaligned loads (but requires software
11 matches
Mail list logo