[pulseaudio-discuss] Resampler quality evaluation: now on music files

Alexander E. Patrakov Sat, 04 Oct 2014 22:48:56 -0700

[tl;dr: speex-float-1 is adequate for 44100 -> 48000 Hz resampling,ffmpeg also is, speex-float-0 isn't]

Previously, I have posted some quality-evaluation results for resamplersthat can be used by PulseAudio:


http://lists.freedesktop.org/archives/pulseaudio-discuss/2014-August/021362.html

http://lists.freedesktop.org/archives/pulseaudio-discuss/2014-September/021811.html

The main objections were:

1. Unbearably loud (92 dB SPL) sound from speakers or headphones. Peopledon't listen at such levels. At lower levels, the distortions also havelower sound pressure, and may become unnoticeable.

2. Absolutely quiet room (except for this sound and resamplerdistortions). In a noisy room, noise can mask ("outvoice") the distortion.

3. Perfect speakers or headphones that don't distort sounds at all bythemselves. Maybe headphone distortions can mask resampler distortions?

4. Sine wave (and not music or speech) as a test sound to be distortedby a resampler. Maybe other frequency components can mask resamplerdistortions?

As it turns out, (4) is a very valid point. The most valid point of allfour. In fact, in the vast majority of music files, the extra componentsof the signal are strong enough to mask the distortions of speex-float-1even without taking other points into account. Still, I have a scriptthat takes (1), (2) and (4) into account, and you can run it on your ownmusic files. As I already explained in the previous email, there is noplan to account for (3).


git clone git://gitorious.org/psy-eval/psy-eval.git

You will need python2.7, numpy, scipy, matplotlib, and also ffmpeg (orpossibly libav).

You also need a wav file with resampler response, and, optionally, arecording of room noise, also as a 16-bit uncompressed wav file. Seehttp://lists.freedesktop.org/archives/pulseaudio-discuss/2014-September/021811.htmlhow to obtain these wav files, or use pre-generated ones:

https://yadi.sk/d/RzV7JGAxbfUve (the same archive as used in theprevious email)

So, the new script is ./music_distortions.py , and it takes thefollowing arguments:

--resampler-response: the wav file with resampler response to a linearfrequency sweep. You can use "speex-float-1.wav".

--rate-from: the sample rate that the sine sweep was resampled from. Forfiles in my archive, that's 44100.

--skip: if the resampler response contains junk in the beginning, usethis to skip a specified number of samples.

--fftsize: the FFT size, at the target sample rate. Useful values are1024 - 8192.


--noise-file: wav file with room noise. Optional.

--noise-full-scale: if you recorded room noise with a calibratedmicrophone and sound card, then you know the dB SPL value correspondingto a full-scale sine wave. Put it here. The default is 92, but you need84 in order to use the noise file from the archive.

--noise-dba: if you have a noise meter instead, put its reading (withthe "A" setting) here. If you have nether a calibrated microphone nor anoise meter, but want to use your own noise file, put 35 here.

--signal-full-scale: If you know the sound pressure level correspondingto the full-scale sine wave at your soundcard output, put it here, indB. The default is 92.

--use-eq: Use this switch to ignore the fact that resampler attenuateshigh frequencies (with the implication that a human can notice thisdistortion if he/she knows that they should be there).

--save: if you want to save the plot, put a prefix of its name here._audibility_vs_time.png will be appended. If you don't specify this, theplot will be shown instead.

--report-only: don't plot anything, just report the average distortion,the maximum distortion, and where it happens.

Finally, specify the music file name. That file should be in any formatsupported by ffmpeg, and should have the same sample rate as --rate-fromsays. Only the front left channel will be taken into account.


E.g.:

./music_distortions.py --signal-full-scale 72 --fftsize 1024--resampler-response speex-float-0.wav --rate-from 44100 --skip 65536--save Prelude Prelude.wav


produces (together with some warnings):

"Prelude.wav", average distortion = -8.8 dB, maximum = -2.2 dB, at 4:33

and the attached plot. If the curve is below 0 dB, an average humancannot notice the distortions. If it is above, then the distortion canbe noticed, provided that the subject knows how the file should soundwith the ideal resampler.

I do have some music files where the script at its default settingsfinds speex-float-1 marginally adequate (i.e. maximum audibility ofdistortions is close to 0 dB), or even not adequate with non-default FFTsize (2048 or 4096) [*]. In all such (rare) cases, --signal-full-scale72 removes the complaint. Probably that's because the complaint isreally about some nearly-ultrasonic frequency component that gotrejected by the resampler in the first case and sank below the absolutethreshold of hearing when the volume was reduced in the second case.


For those who want to test, here are the affected New Age albums:

Ryan Farish - Everlasting
Australis - The Gates of Reality
Daveed - Songs From Beyond

Interestingly, the "average" figure is worse on speech material (such asforeign language courses) than on music.

[*] The FFT size dependency is, strictly speaking, a bug. This isprobably related to the use of a narrow (low-noise) window withoutsufficient overlap, so the bad fragment just slips through the gapbetween the two neighboring positions of the 1024-sample window. Still,the average figure is stable when changing the FFT size.

P.S. Tomorrow I have a flight to France (due to XDC 2014), so I won't beable to answer your questions quickly.


--
Alexander E. Patrakov

_______________________________________________
pulseaudio-discuss mailing list
pulseaudio-discuss@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss

[pulseaudio-discuss] Resampler quality evaluation: now on music files

Reply via email to