Re: d3dx9: Implement D3DXSHMultilpy5

David Laight Mon, 15 Apr 2013 12:26:02 -0700

On Mon, Apr 15, 2013 at 10:24:28AM +0200, Rico Sch?ller wrote:
> You are right. I'm not really sure why the code should be slower though. 
> The #defines shouldn't have an impact on the performance, but it might 
> be because it is translated to:
> 
> ta = 0.28209479f * a[0] + -0.12615662f * a[6] + -0.21850968f * a[8];
> tb = 0.28209479f * b[0] + -0.12615662f * b[6] + -0.21850968f * b[8];
> out[1] = 0.0f + ta * b[1] + tb * a[1];
> t = a[1] * b[1];
> out[0] = out[0] + 0.28209479f * t;
> out[6] = 0.0f + -0.12615662f * t;
> out[8] = 0.0f + -0.21850968f * t;
> 
> instead of:
> ta = 0.28209479f * a[0] - 0.12615662f * a[6] - 0.21850968f * a[8];
> tb = 0.28209479f * b[0] - 0.12615662f * b[6] - 0.21850968f * b[8];
> out[1] = ta * b[1] + tb * a[1];
> t = a[1] * b[1];
> out[0] += 0.28209479f * t;
> out[6] = -0.12615662f * t;
> out[8] = -0.21850968f * t;


If everything is 'float' (no doubles anywhere) then I can't see
why the above would compile to different code at any sane
optimisation level - best to look at the object code.

Unless the compiler knows that out[] can't overlap a[] or b[] the
generated code is likely to be better if 't' is evaluated before
the write to out[1].

        David

-- 
David Laight: da...@l8s.co.uk

Re: d3dx9: Implement D3DXSHMultilpy5

Reply via email to