[Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-04 Thread Juha-Pekka Heikkila
Signed-off-by: Juha-Pekka Heikkila --- src/mesa/Makefile.am | 8 +++ src/mesa/main/x86/sse2_clamping.c | 103 ++ src/mesa/main/x86/sse2_clamping.h | 49 ++ 3 files changed, 160 insertions(+) create mode 100644 src/mesa/main/x86/

Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-04 Thread Siavash Eliasi
Hello. I'd get rid of "_mm_set1_ps" inside "_mesa_clamp_float_rgba" by passing _m128 version of min/max directly, so "_mm_set1_ps" will be moved out of the for loop. I'd also unroll the "_mesa_streaming_clamp_float_rgba" loop to minimize the loop overhead (and utilize out of order execution as

Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-04 Thread Patrick Baggett
On Tue, Nov 4, 2014 at 6:05 AM, Juha-Pekka Heikkila < juhapekka.heikk...@gmail.com> wrote: > Signed-off-by: Juha-Pekka Heikkila > --- > src/mesa/Makefile.am | 8 +++ > src/mesa/main/x86/sse2_clamping.c | 103 > ++ > src/mesa/main/x86/sse2_clampi

Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-04 Thread Roland Scheidegger
Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila: > Signed-off-by: Juha-Pekka Heikkila > --- > src/mesa/Makefile.am | 8 +++ > src/mesa/main/x86/sse2_clamping.c | 103 > ++ > src/mesa/main/x86/sse2_clamping.h | 49 ++ > 3 file

Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-05 Thread Juha-Pekka Heikkila
Hi, I did rely on gcc optimization run on moving things around for me. What _mesa_streaming_clamp_float_rgba really look like when I compile it is this: Dump of assembler code for function _mesa_streaming_clamp_float_rgba: 0x7401a0a0 <+0>: test %edi,%edi 0x7401a0a2 <+2

Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-05 Thread Juha-Pekka Heikkila
On 04.11.2014 21:46, Patrick Baggett wrote: > > > On Tue, Nov 4, 2014 at 6:05 AM, Juha-Pekka Heikkila > mailto:juhapekka.heikk...@gmail.com>> wrote: > > Signed-off-by: Juha-Pekka Heikkila > > --- > src/mesa/Makefile.am | 8 +++

Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-05 Thread Juha-Pekka Heikkila
On 04.11.2014 23:24, Roland Scheidegger wrote: > Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila: >> Signed-off-by: Juha-Pekka Heikkila >> --- >> src/mesa/Makefile.am | 8 +++ >> src/mesa/main/x86/sse2_clamping.c | 103 >> ++ >> src/mesa/main

Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-05 Thread Roland Scheidegger
Am 05.11.2014 um 10:13 schrieb Juha-Pekka Heikkila: > On 04.11.2014 23:24, Roland Scheidegger wrote: >> Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila: >>> Signed-off-by: Juha-Pekka Heikkila >>> --- >>> src/mesa/Makefile.am | 8 +++ >>> src/mesa/main/x86/sse2_clamping.c | 103

Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-05 Thread Ian Romanick
On 11/04/2014 01:24 PM, Roland Scheidegger wrote: > Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila: >> + for(i = 0; i < n; i++) { >> + _mesa_clamp_float_rgba(rgba_src[i], temp, min, max); >> + >> + *operand = _mm_mul_ps(multiplier, *operand); >> + truncated_integers = _mm_cvttp

Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-06 Thread Juha-Pekka Heikkila
On 05.11.2014 21:21, Ian Romanick wrote: > On 11/04/2014 01:24 PM, Roland Scheidegger wrote: >> Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila: >>> + for(i = 0; i < n; i++) { >>> + _mesa_clamp_float_rgba(rgba_src[i], temp, min, max); >>> + >>> + *operand = _mm_mul_ps(multiplier, *op