Hi Will,
Hi Kristian,

today I've spent several hours to investigate those numerical differences,
which we saw when running the test suite on different CPUs.

Right now I have some kind of intermediary result -- I have identified one
"culprit", but I do not know to what extent this explains all differences
we saw; moreover it is not clear yet if we want to fix this at all.
Anyway, here is what I found out....


As explained in one of my previous messages, last weekend I rather accidentally
discovered that the differences do not show up on debug builds. And thus the
first thing I investigated, was the impact of strong optimisation (-O3)

So the setup for my investigation today was...
- developer build in my project setup, to allow easy debugging with the IDE
- but change the Optimisation flag (see BuildOptionsDebug in the CMake build)
- thus make a build with debugging infos, but once with -O0 and once with -O3
- then rig the testsuite, to launch gdbserver with this build as inferior
- this way I can simultaneously debug both variants, to look at computed numbers

The attached Image shows the residual and the baseline wave with Audacity, both
normalised to 0dB FS -- the residual is thus extremely enlarged, the difference
reported by the testsuite was Δ -116.045dB(RMS)
The highlighted area is the 2nd chunk of 128 samples, and I did my investigation
on the first sample in this second chunk.

What I found out on this first sample thus far...
- all the Oscillator computations are 100% the same on both builds
- but the precomputed Volume for the part differs on the last 2 places

Thus...
outl[i] += tmpwavel[i] * NoteVoicePar[nvoice].Volume * pangainL;

will "taint" all computed samples

.Volume = 0.230252668 (with -O0 )
.Volume = 0.230252683 (with -O3 )

This immediately leads us to the "culprit" in ADnote.cpp,
ADnote::computeNoteParameters(), Line 1278ff
NoteVoicePar[nvoice].Volume =
  powf(0.1f, 3.0f * (1.0f - adpars->VoicePar[nvoice].PVolume / 127.0f))
  * velF(velocity, adpars->VoicePar[nvoice].PAmpVelocityScaleFunction);

The velocity function yields 1.0, but the powf() is seemingly rigged up
quite differently in the generated assembly code; it looks like the optimised
version uses SSE instructions (or uses them in a different way).


Obviously I then did a check with a minimal standalone C++ program
and could indeed reproduce those different numeric results....

As said, I haven't looked beyond that first identified problem, which
means, I do not know yet if this explains all the differences we saw.
Anyhow, it is unfortunate to have that "number dust" in the volume,
since obviously this interacts with the limited float-precision and is
amplified as the computed function moves up and down.

Incidentally, it is rather questionable to use a generic power function
with a fixed base (as here with 0.1f), since this can trivially be rewritten
into an exponential function  a^b = e ^(ln(a)*b)
I just did a quick test, and this workaround yields the same results with
-O0 and -O3 (at least in a standalone C++ program). I haven't tested this
on different CPU/Platform yet.
Of course we have to be careful here, but in theory the exponential function
should be faster, since it has to do less checks than a generic power function;
on the other hand, the precomputed log(0.1) introduces yet another small change
with the same order of magnitude.

I'll investigate that further next days...

Cheers,
Hermann


_______________________________________________
Yoshimi-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/yoshimi-devel

Reply via email to