On Feb 15, 2013, at 17:08, Claudio Freire wrote:
>
> SSE intrinsics (and automatic vectorization in fact) perform a lot
> more poorly if you don't use __builtin_assume_aligned[0]
>
>
> [0] http://gcc.gnu.org/projects/tree-ssa/vectorization.html#assume-aligned
Ah, of course ... but OS X memory
On Fri, Feb 15, 2013 at 12:48 PM, "René J.V. Bertin"
wrote:
> On Feb 15, 2013, at 16:33, Claudio Freire wrote:
>
>> gcc, which tends to inhibit many of its other optimizations. Why don't
>> you try gcc's vector primitives instead?
>
> Which ones?
Well, you specify "memory" so it will inhibit all
Thanks, Claudio!
On Feb 15, 2013, at 16:33, Claudio Freire wrote:
> gcc 4.7 is clever enough to generate SSE code by itself. Maybe that's
> what you're experiencing. I guess compiler flags do matter too.
I haven't compiled with -ftree-vectorize (rather, I tried with and without,
made no differe
On Fri, Feb 15, 2013 at 6:37 AM, "René J.V. Bertin" wrote:
> On my 2.7Ghz dual-core i7 MBP, I get about 1Hz for the SSE version, and
> roughly half that for the generic, scalar function, using gcc-4.2 as well as
> using MSVC 2010 Express running under WinXP in VirtualBox. The factor 2 speed
'Morning!
I guess there are a number of people on here who are experts at writing
optimised code exploiting every bit of a processor's instruction set. The code
I recently isolated from the Perian project also attempts this, and I just came
across something that got flabbergasted me. Perian is