These bunch of define are pretty ugly anyway. I prefer a pretty duplicated code rather than these tons of defines. Define should be avoided as much as possible.
________________________________ On 15.04.2013 02:10, Nozomi Kodama wrote: > Hello > > thanks for the review. > I don't think that calling defines is the way to go. Indeed, I tested my > patch and yours. Yours is about 12% slower than mine in my computer. > And now, we try to take care of efficiency of this dll. So, it is not the > time to increase latency. > You are right. I'm not really sure why the code should be slower though. The #defines shouldn't have an impact on the performance, but it might be because it is translated to: ta = 0.28209479f * a[0] + -0.12615662f * a[6] + -0.21850968f * a[8]; tb = 0.28209479f * b[0] + -0.12615662f * b[6] + -0.21850968f * b[8]; out[1] = 0.0f + ta * b[1] + tb * a[1]; t = a[1] * b[1]; out[0] = out[0] + 0.28209479f * t; out[6] = 0.0f + -0.12615662f * t; out[8] = 0.0f + -0.21850968f * t; instead of: ta = 0.28209479f * a[0] - 0.12615662f * a[6] - 0.21850968f * a[8]; tb = 0.28209479f * b[0] - 0.12615662f * b[6] - 0.21850968f * b[8]; out[1] = ta * b[1] + tb * a[1]; t = a[1] * b[1]; out[0] += 0.28209479f * t; out[6] = -0.12615662f * t; out[8] = -0.21850968f * t; May be due to "out[8] = -0.21850968f * t;" vs "out[8] = 0.0f + -0.21850968f * t;": 1. the extra 0.0f - Shouldn't the 0.0f get optimized away? Imho the macro could be fixed (e.g. use an if / separate macro / don't use a macro for this cases). 2. "+ -0.21850968f * t;" should be as fast as "-0.21850968f * t" isn't it? Does anyone else have an objection about this? I just feel that the code size and the always reused constants could be written a little bit nicer. I'm not sure if the macro usage is really better, it was just a quick idea to avoid that much code duplication. Also changes like the suggested below are really error prone, because they change a lot of code mostly using copy and paste. Though I vote for the same style and precision usage in all cases. Hope this helps a bit Rico > > I used 10 digits since there are a lot of computation, I want to avoid as > much as possible big rounding errors. If we want to uniformize, then we > should uniformize d3dxshmultiply 2,3,4 with 10 digits. > But that is for an another patch. > > Nozomi. > >