Re: [Libav-user] a little performance/optimisation headbreaker :)

2013-02-15 Thread René J.V. Bertin
On Feb 15, 2013, at 17:08, Claudio Freire wrote: > > SSE intrinsics (and automatic vectorization in fact) perform a lot > more poorly if you don't use __builtin_assume_aligned[0] > > > [0] http://gcc.gnu.org/projects/tree-ssa/vectorization.html#assume-aligned Ah, of course ... but OS X memory

Re: [Libav-user] a little performance/optimisation headbreaker :)

2013-02-15 Thread Claudio Freire
On Fri, Feb 15, 2013 at 12:48 PM, "René J.V. Bertin" wrote: > On Feb 15, 2013, at 16:33, Claudio Freire wrote: > >> gcc, which tends to inhibit many of its other optimizations. Why don't >> you try gcc's vector primitives instead? > > Which ones? Well, you specify "memory" so it will inhibit all

Re: [Libav-user] a little performance/optimisation headbreaker :)

2013-02-15 Thread René J.V. Bertin
Thanks, Claudio! On Feb 15, 2013, at 16:33, Claudio Freire wrote: > gcc 4.7 is clever enough to generate SSE code by itself. Maybe that's > what you're experiencing. I guess compiler flags do matter too. I haven't compiled with -ftree-vectorize (rather, I tried with and without, made no differe

Re: [Libav-user] a little performance/optimisation headbreaker :)

2013-02-15 Thread Claudio Freire
On Fri, Feb 15, 2013 at 6:37 AM, "René J.V. Bertin" wrote: > On my 2.7Ghz dual-core i7 MBP, I get about 1Hz for the SSE version, and > roughly half that for the generic, scalar function, using gcc-4.2 as well as > using MSVC 2010 Express running under WinXP in VirtualBox. The factor 2 speed

[Libav-user] a little performance/optimisation headbreaker :)

2013-02-15 Thread René J.V. Bertin
'Morning! I guess there are a number of people on here who are experts at writing optimised code exploiting every bit of a processor's instruction set. The code I recently isolated from the Perian project also attempts this, and I just came across something that got flabbergasted me. Perian is