[tl;dr: speex-float-1 is adequate for 44100 -> 48000 Hz resampling, ffmpeg also is, speex-float-0 isn't]

Previously, I have posted some quality-evaluation results for resamplers that can be used by PulseAudio:

http://lists.freedesktop.org/archives/pulseaudio-discuss/2014-August/021362.html

http://lists.freedesktop.org/archives/pulseaudio-discuss/2014-September/021811.html

The main objections were:

1. Unbearably loud (92 dB SPL) sound from speakers or headphones. People don't listen at such levels. At lower levels, the distortions also have lower sound pressure, and may become unnoticeable.

2. Absolutely quiet room (except for this sound and resampler distortions). In a noisy room, noise can mask ("outvoice") the distortion.

3. Perfect speakers or headphones that don't distort sounds at all by themselves. Maybe headphone distortions can mask resampler distortions?

4. Sine wave (and not music or speech) as a test sound to be distorted by a resampler. Maybe other frequency components can mask resampler distortions?

As it turns out, (4) is a very valid point. The most valid point of all four. In fact, in the vast majority of music files, the extra components of the signal are strong enough to mask the distortions of speex-float-1 even without taking other points into account. Still, I have a script that takes (1), (2) and (4) into account, and you can run it on your own music files. As I already explained in the previous email, there is no plan to account for (3).

git clone git://gitorious.org/psy-eval/psy-eval.git

You will need python2.7, numpy, scipy, matplotlib, and also ffmpeg (or possibly libav).

You also need a wav file with resampler response, and, optionally, a recording of room noise, also as a 16-bit uncompressed wav file. See http://lists.freedesktop.org/archives/pulseaudio-discuss/2014-September/021811.html how to obtain these wav files, or use pre-generated ones:

https://yadi.sk/d/RzV7JGAxbfUve (the same archive as used in the previous email)

So, the new script is ./music_distortions.py , and it takes the following arguments:

--resampler-response: the wav file with resampler response to a linear frequency sweep. You can use "speex-float-1.wav".

--rate-from: the sample rate that the sine sweep was resampled from. For files in my archive, that's 44100.

--skip: if the resampler response contains junk in the beginning, use this to skip a specified number of samples.

--fftsize: the FFT size, at the target sample rate. Useful values are 1024 - 8192.

--noise-file: wav file with room noise. Optional.

--noise-full-scale: if you recorded room noise with a calibrated microphone and sound card, then you know the dB SPL value corresponding to a full-scale sine wave. Put it here. The default is 92, but you need 84 in order to use the noise file from the archive.

--noise-dba: if you have a noise meter instead, put its reading (with the "A" setting) here. If you have nether a calibrated microphone nor a noise meter, but want to use your own noise file, put 35 here.

--signal-full-scale: If you know the sound pressure level corresponding to the full-scale sine wave at your soundcard output, put it here, in dB. The default is 92.

--use-eq: Use this switch to ignore the fact that resampler attenuates high frequencies (with the implication that a human can notice this distortion if he/she knows that they should be there).

--save: if you want to save the plot, put a prefix of its name here. _audibility_vs_time.png will be appended. If you don't specify this, the plot will be shown instead.

--report-only: don't plot anything, just report the average distortion, the maximum distortion, and where it happens.

Finally, specify the music file name. That file should be in any format supported by ffmpeg, and should have the same sample rate as --rate-from says. Only the front left channel will be taken into account.

E.g.:

./music_distortions.py --signal-full-scale 72 --fftsize 1024 --resampler-response speex-float-0.wav --rate-from 44100 --skip 65536 --save Prelude Prelude.wav

produces (together with some warnings):

"Prelude.wav", average distortion = -8.8 dB, maximum = -2.2 dB, at 4:33

and the attached plot. If the curve is below 0 dB, an average human cannot notice the distortions. If it is above, then the distortion can be noticed, provided that the subject knows how the file should sound with the ideal resampler.

I do have some music files where the script at its default settings finds speex-float-1 marginally adequate (i.e. maximum audibility of distortions is close to 0 dB), or even not adequate with non-default FFT size (2048 or 4096) [*]. In all such (rare) cases, --signal-full-scale 72 removes the complaint. Probably that's because the complaint is really about some nearly-ultrasonic frequency component that got rejected by the resampler in the first case and sank below the absolute threshold of hearing when the volume was reduced in the second case.

For those who want to test, here are the affected New Age albums:

Ryan Farish - Everlasting
Australis - The Gates of Reality
Daveed - Songs From Beyond

Interestingly, the "average" figure is worse on speech material (such as foreign language courses) than on music.

[*] The FFT size dependency is, strictly speaking, a bug. This is probably related to the use of a narrow (low-noise) window without sufficient overlap, so the bad fragment just slips through the gap between the two neighboring positions of the 1024-sample window. Still, the average figure is stable when changing the FFT size.

P.S. Tomorrow I have a flight to France (due to XDC 2014), so I won't be able to answer your questions quickly.

--
Alexander E. Patrakov
_______________________________________________
pulseaudio-discuss mailing list
pulseaudio-discuss@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss

Reply via email to