On Mon, Apr 15, 2013 at 10:24:28AM +0200, Rico Sch?ller wrote: > You are right. I'm not really sure why the code should be slower though. > The #defines shouldn't have an impact on the performance, but it might > be because it is translated to: > > ta = 0.28209479f * a[0] + -0.12615662f * a[6] + -0.21850968f * a[8]; > tb = 0.28209479f * b[0] + -0.12615662f * b[6] + -0.21850968f * b[8]; > out[1] = 0.0f + ta * b[1] + tb * a[1]; > t = a[1] * b[1]; > out[0] = out[0] + 0.28209479f * t; > out[6] = 0.0f + -0.12615662f * t; > out[8] = 0.0f + -0.21850968f * t; > > instead of: > ta = 0.28209479f * a[0] - 0.12615662f * a[6] - 0.21850968f * a[8]; > tb = 0.28209479f * b[0] - 0.12615662f * b[6] - 0.21850968f * b[8]; > out[1] = ta * b[1] + tb * a[1]; > t = a[1] * b[1]; > out[0] += 0.28209479f * t; > out[6] = -0.12615662f * t; > out[8] = -0.21850968f * t;
If everything is 'float' (no doubles anywhere) then I can't see why the above would compile to different code at any sane optimisation level - best to look at the object code. Unless the compiler knows that out[] can't overlap a[] or b[] the generated code is likely to be better if 't' is evaluated before the write to out[1]. David -- David Laight: da...@l8s.co.uk