Hi Will, Hi Kristian,
today I've spent several hours to investigate those numerical differences, which we saw when running the test suite on different CPUs. Right now I have some kind of intermediary result -- I have identified one "culprit", but I do not know to what extent this explains all differences we saw; moreover it is not clear yet if we want to fix this at all. Anyway, here is what I found out.... As explained in one of my previous messages, last weekend I rather accidentally discovered that the differences do not show up on debug builds. And thus the first thing I investigated, was the impact of strong optimisation (-O3) So the setup for my investigation today was... - developer build in my project setup, to allow easy debugging with the IDE - but change the Optimisation flag (see BuildOptionsDebug in the CMake build) - thus make a build with debugging infos, but once with -O0 and once with -O3 - then rig the testsuite, to launch gdbserver with this build as inferior - this way I can simultaneously debug both variants, to look at computed numbers The attached Image shows the residual and the baseline wave with Audacity, both normalised to 0dB FS -- the residual is thus extremely enlarged, the difference reported by the testsuite was Δ -116.045dB(RMS) The highlighted area is the 2nd chunk of 128 samples, and I did my investigation on the first sample in this second chunk. What I found out on this first sample thus far... - all the Oscillator computations are 100% the same on both builds - but the precomputed Volume for the part differs on the last 2 places Thus... outl[i] += tmpwavel[i] * NoteVoicePar[nvoice].Volume * pangainL; will "taint" all computed samples .Volume = 0.230252668 (with -O0 ) .Volume = 0.230252683 (with -O3 ) This immediately leads us to the "culprit" in ADnote.cpp, ADnote::computeNoteParameters(), Line 1278ff
NoteVoicePar[nvoice].Volume = powf(0.1f, 3.0f * (1.0f - adpars->VoicePar[nvoice].PVolume / 127.0f)) * velF(velocity, adpars->VoicePar[nvoice].PAmpVelocityScaleFunction);
The velocity function yields 1.0, but the powf() is seemingly rigged up quite differently in the generated assembly code; it looks like the optimised version uses SSE instructions (or uses them in a different way). Obviously I then did a check with a minimal standalone C++ program and could indeed reproduce those different numeric results.... As said, I haven't looked beyond that first identified problem, which means, I do not know yet if this explains all the differences we saw. Anyhow, it is unfortunate to have that "number dust" in the volume, since obviously this interacts with the limited float-precision and is amplified as the computed function moves up and down. Incidentally, it is rather questionable to use a generic power function with a fixed base (as here with 0.1f), since this can trivially be rewritten into an exponential function a^b = e ^(ln(a)*b) I just did a quick test, and this workaround yields the same results with -O0 and -O3 (at least in a standalone C++ program). I haven't tested this on different CPU/Platform yet. Of course we have to be careful here, but in theory the exponential function should be faster, since it has to do less checks than a generic power function; on the other hand, the precomputed log(0.1) introduces yet another small change with the same order of magnitude. I'll investigate that further next days... Cheers, Hermann
_______________________________________________ Yoshimi-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/yoshimi-devel
