Re: [Pixman] [PATCH 2/2] sse2, mmx: Remove initial unaligned loops in fetchers

2013-08-28 Thread Søren Sandmann
Siarhei Siamashka writes: >> With this new alignment assumption, such an optimization becomes even more >> impossible, > > Implementing this optimization does not seem to be too difficult in > principle. I tried to hack a bit and here is the result: > > http://lists.freedesktop.org/archives/

Re: [Pixman] [PATCH 2/2] sse2, mmx: Remove initial unaligned loops in fetchers

2013-08-28 Thread Siarhei Siamashka
On Thu, 29 Aug 2013 01:27:08 +0200 sandm...@cs.au.dk (Søren Sandmann) wrote: > Siarhei Siamashka writes: > > > On Wed, 28 Aug 2013 16:01:27 -0400 > > Søren Sandmann wrote: > > > >> From: Søren Sandmann Pedersen > >> > >> Now that the general implementation guarantees that the iter buffers > >

[Pixman] [PATCH 2/2] Support direct fetch to destination for r5g6b5 fetchers

2013-08-28 Thread Siarhei Siamashka
Because the redundant memcpy step is avoided, overall performance is improved. Running lowlevel-blt-bench on Intel Core-i7 860 @2.8GHz: before:src_0565_ = L1: 931.54 L2: 888.93 M:638.34 after: src_0565_ = L1:1031.66 L2:1003.42 M:871.54 --- pixman/pixman-fast-path.c | 6 -

[Pixman] [PATCH 1/2] Shortcut for the source iterator to fetch directly to destination

2013-08-28 Thread Siarhei Siamashka
In the case if combine step is going to be a simple memcpy from the temporary buffer to the destination (SRC operator, no mask, x8r8g8b8 or a8r8g8b8 destination format), just route the source iterator-based fetch operation to the destination buffer. Earlier the source iterator was getting a const

Re: [Pixman] [PATCH 2/2] sse2, mmx: Remove initial unaligned loops in fetchers

2013-08-28 Thread Søren Sandmann
Siarhei Siamashka writes: > On Wed, 28 Aug 2013 16:01:27 -0400 > Søren Sandmann wrote: > >> From: Søren Sandmann Pedersen >> >> Now that the general implementation guarantees that the iter buffers >> are aligned to 16 bytes, there is no longer any reason for the initial >> loop to bring the de

Re: [Pixman] [PATCH 2/2] sse2, mmx: Remove initial unaligned loops in fetchers

2013-08-28 Thread Siarhei Siamashka
On Wed, 28 Aug 2013 16:01:27 -0400 Søren Sandmann wrote: > From: Søren Sandmann Pedersen > > Now that the general implementation guarantees that the iter buffers > are aligned to 16 bytes, there is no longer any reason for the initial > loop to bring the destination buffer up to an aligned posi

[Pixman] [PATCH 2/2] sse2, mmx: Remove initial unaligned loops in fetchers

2013-08-28 Thread Søren Sandmann
From: Søren Sandmann Pedersen Now that the general implementation guarantees that the iter buffers are aligned to 16 bytes, there is no longer any reason for the initial loop to bring the destination buffer up to an aligned position. --- pixman/pixman-mmx.c | 20 pixman/p

[Pixman] [PATCH 1/2] general: Ensure that iter buffers are aligned to 16 bytes

2013-08-28 Thread Søren Sandmann
From: Søren Sandmann Pedersen At the moment iter buffers are only guaranteed to be aligned to an 8 byte bit boundary. It is useful for SIMD implementations to be able to assume that these buffers are aligned to 16 bytes, so ensure this. --- pixman/pixman-general.c | 22 +++---