[Pixman] [PATCH 5/5] Add SSE2 fetcher for 0565

2011-01-28 Thread Søren Sandmann
From: Søren Sandmann Pedersen Before: add_0565_0565 = L1: 61.08 L2: 61.03 M: 60.57 ( 10.95%) HT: 46.85 VT: 45.25 R: 39.99 RT: 20.41 ( 233Kops/s) After: add_0565_0565 = L1: 77.84 L2: 76.25 M: 75.38 ( 13.71%) HT: 55.99 VT: 54.56 R: 45.41 RT: 21.95 ( 255Kops/s) --- pixman/pixma

[Pixman] [PATCH 4/5] Improve performance of sse2_combine_over_u()

2011-01-28 Thread Søren Sandmann
From: Søren Sandmann Pedersen Split this function into two, one that has a mask, and one that doesn't. This is a fairly substantial speed-up in many cases. New output of lowlevel-blt-bench over_x888_8_0565: over_x888_8_0565 = L1: 63.76 L2: 62.75 M: 59.37 ( 21.55%) HT: 45.89 VT: 43.55 R

[Pixman] [PATCH 3/5] Add SSE2 fetcher for a8

2011-01-28 Thread Søren Sandmann
From: Søren Sandmann Pedersen New output of lowlevel-blt-bench over_x888_8_0565: over_x888_8_0565 = L1: 57.85 L2: 56.80 M: 54.14 ( 19.50%) HT: 42.64 VT: 40.56 R: 32.67 RT: 16.22 ( 195Kops/s) --- pixman/pixman-sse2.c | 49 - 1 files cha

[Pixman] [PATCH 2/5] Add SSE2 fetcher for x8r8g8b8

2011-01-28 Thread Søren Sandmann
From: Søren Sandmann Pedersen New output of lowlevel-blt-bench over_x888_8_0565: over_x888_8_0565 = L1: 55.68 L2: 55.11 M: 52.83 ( 19.04%) HT: 39.62 VT: 37.70 R: 30.88 RT: 14.62 ( 174Kops/s) The fetcher is looked up in a table, so that other fetchers can easily be added. --- pixman/p

[Pixman] [PATCH 1/5] Add a test for over_x888_8_0565 in lowlevel_blt_bench().

2011-01-28 Thread Søren Sandmann
From: Søren Sandmann Pedersen The next few commits will speed this up quite a bit. Current output: --- reference memcpy speed = 2217.5MB/s (554.4MP/s for 32bpp fills) --- over_x888_8_0565 = L1: 54.67 L2: 54.01 M: 52.33 ( 18.88%) HT: 37.19 VT: 35.54 R: 29.40 RT: 13.63 ( 162Kops/s) ---