On 2017-06-25, Fons Adriaensen wrote:

Could it be that you're just talking about different perceptual weightings? I mean, if we talk about noise, there we shouldn't ever go with A-weighting, or even C-weighting, but the ITU 468 curve.

Electrical noise from active mics and mic preamps is normally white and Gaussian, except at low frequencies where you will typically find some 1/f noise. If that is not the case there is something seriously wrong (but see remark about A/B conversion below).

Correct, but then soundfield type mics are compound beasts with things like differencing and capsule matching going on as well. Those can lead to issues foreign to the natural noise characteristic of any monolithic mic, like a highpass component on the noise floor of the first stage amplifier, and (guessing on theoretical grounds only since I'm no practitioner like you) perhaps most crucially high frequency ripple in the noise floor (too), because of capsule spacing and phasing imperfections.

The reason I brought up the ITU curve is that simply by reading from the response curve, such anomalies would be belittled by A-weighting with its rapid HF rolloff, while being accentuated by the brutal peak of the ITU curve, smack in the center of the speech intelligibility band and reaching up from it to the 6kHz range where matching issues first start to show. At least with lower end soundfield kinda designs, with smaller diaphragms and more, detailed, physical geometry between the capsules, instead of the more symmetrical, large diaphragm design of the likes of the original SoundFields.

(I'm grasping a bit here, but wasn't it so that the MkIV and MkV only started to exhibit mismatch above 10kHz, where the ITU curve already rapidly rolls off?)

Plus of course the ITU curve was designed with a noise measurement goal in mind, unlike the A-weighting and C-weighting ones (their only difference being in the reference SPL, both being linear weightings derived from equal loudness contours of the Fletcher-Munson and later Robinson-Dadson kind). I believe it really is Better wrt to what Enda started with, on top of this thread.

But of course its derivation or at least its most common intended application also leaves a lot to be desired. ITU-R BS.468-4 as it stands doesn't really tell much about what is being measured or how, except that it's noise at electrical signal levels, and that we measure it using a particular analogue-implementable circuit consisting of a passive filter plus a quasi-peak detector. I.e. 1) the filter definition itself is a remnant of analogue era gone by, and perhaps most saliently, 2a) the reasoning behind the precise processing engendered by the filter and 2b) its proper ambit of application are *thoroughly* obscured. Most importantly we don't know whether the standard was meant to quantify noise in an idle channel, by itself, or noise relative to a utility signal present at the same time -- a huge difference even when you apply the idea to analogue noise reduction, as the

it's been mostly applied to environmental noise pollution, where the noise is the utility signal under study, but also to the quantification of background noise in the presence of a utility signal, such as in noise reduction. Those two scenarios aren't interchangeable by a long shot, even if the curve itself seems to perform rather better than its primitive elders in both its roles; the extra nonlinear processing applied to the signal in the base standard,


If the noise is white then it doesn't matter much which weighting filter is used, as long as it is specified.

Doesn't it though, when you want to translate a purely objective measurement into perceptual noisiness? I thought that was the very essence of why all of the weighting curves were conceived in general, in the first place, and the ITU one in particular. (Notwithstanding the rest of the processing which goes along with the curve.) I mean, essentially any weighting curve is a frequency-wise decision of what matters and what does not; where the curve peaks, you'll have the frequencies which most contribute to the aggregate noise figure, and where you have the most attenuation, you're essentially downgrading the importance of noise over that band.

Given my argument from compoundness of spatial mic designs above, I'd argue a weighting which peaks around the frequencies which lead to soundfield typical aberrations is more sensitive to the perceptual shortcomings of this class of mics.

Granted, I can't really argue that such a weighting would be better except by half-coincidence: while the ITU-468 curve really was designed for the measurement of confounding noise in the presence (implicitly?) of a utility signal, instead of audibility of signals as such like the A, C and whatnot weightings, and so is probably better in the role we're currently discussing in any case...on the other hand the perfect weighting and whatever extra nonlinear machinery we might deem necessary in order to appraise directional mic accuracy probably *would* differ broadly from everything we have in our measurement toolkit right now. So, my invoking the ITU curve is just a half measure...but I think a relevant half-measure still.

Compared to a flat 20 kHz bandwidth, the A-filter will typically show around 2 dB less, and ITU-468 will give around 11 dB more.

I'll take the mean as given. What I'd however be more interested in is the relative variance, and in particular how it behaves over weighting and the binary condition of a mic being of soundfield type versus being a high grade (thus flat as can be, on-axis as far as the mic has one) monophonic measurement mic of any kind. And of course in the end it'd be nice to see such figures being correlated with rigorous perceptual work, MUSHRA's and all.

I know something like that is a tall order, and asking for something which *definitely* isn't available in the literature or the marketing brochures right now. Also, I probably should have made my reasoning more explicit to begin with, and noted that it might -- as usual -- go a bit broadside with regard to the original discussion. So, my apologies, as usual.

But it'd still be rather interesting to see what might come out of what I mentioned above; I really don't think even the basic psychoacoustical machinery of coincident mic design is too well developed as of date. Perhaps that idea of relative variance over a four-field of discrete choices, A-weighting/ITU-weighting, monophonic/periphonic, as correlated to a MUSHRA score, might serve as a starting point for one more thesis work? ;)

Most manufacturers provide A-weighted measurements, these are more or less the de-facto standard. Very few will give ITU-468 figures.

I'd argue, sadly so.

The one which peaks as fuck between 2-6kHz, and explains how things like Dolby B and C sliding band companders work so well; the one which also fails to explain the loudness of impulsive, nonstationary, nonlinear noise, yet.

Not sure what you mean by the latter. The ITU-468 method is quite sensitive to impulsive noise - it was designed to be.

Correct you are, once again. I often manage to to write precisely the opposite of what I was thinking about. I really should take more care with my output and flights of thought (fancy?) -- as you've in fact pointed out a couple of times in the past already.

I also should point out that as far as implicit noise figures of a coincident, spatial mic go, as here, I should have made it clearer that I was referring only to the linear pre-equalization curve of the ITU standard. Absent the further nonlinear detection machinery. As you say below, that part of the standard is a mess in its own right.

That is mainly not the result of the filter but of the very peculiar pseudo-peak detector specified by the standard.

Yes. That step is highly suspect in the literature, in itself, wrt all of the applications of the standard.

I think everybody will agree that something *like* it probably will need to be in the measurement signal chain, if we want to account for the particularities of impulsive noise. It's just that the ITU pseudo-peak machinery is old as fuck, constrained by primitive analogue circuitry, and quite possibly not too well thought out to begin with. Certainly there's been as much critical commentary in the literature towards it as there possibly could be, considering how peripheral to general audio practice the field of noise metrology has over time proven to be.

Had I to guess, I'd also think we have a kind of anti-novelty bias at work here: since few people know about the existence of the ITU standard, as opposed to the more well known A-weighting and its ilk, we tend to think the ITU thingy is somehow much newer. We think it it must somehow represent a quantum leap over what came before, so that if it doesn't, it must somehow be deficient.

In reality the ITU standard descends from 1970, which considering the huge progress in audio theory and technology in recent decades seems positively *ancient*. Sure, A-weighting was set in the thirties, so it's much older, set against the human lifespan and the inevitable generational shifts caused by it. But set against the background of accelerating and at the same time *highly* uneven technological progress...

In audio reproduction, the seventies were then *much* akin to the thirties; especially on the practical side. Everybody already knew their continuous time wave equations and what could be derived from them, all of the analogue electronics equations were well know, and so on. So compared to today, only one discrete revolution and one continuous evolution really have made a difference: digital technology/the explosion of computing power to back it up as the revolutionary kind, and then materials plus modelling technology -- also speeded up by the computer aided, digital side of things -- as the slow, difficult to comprehend evolution. That which finally lets us have speakers, mics, rooms, mixing consoles, whatever acoustical and electro-acoustical implements, *finally* approximating the both the theoretical predictions of pure acoustics, and the limits of psychoacoustical research, at the same time.

So why would I rant as long as I have? Well, I think it's just idiotic that we have to rely on even the ITU curve-or-standard when we talk about noise measures and their perceptual import. Even the cutting edge experimentalists and doers-shakers like you on the academic side and Enda perhaps slightly on the more practical one, still rely on stuff coming from the 30's and at most 70's. That's just insane, because *especially* considering what we can now do with noise measurements, after the empirical work which went into the development of first analogue noise reduction plus especially after that into perceptual noise shaping in A/D/A converters and lossy, digital, perceptual codecs, we have an *abundance* of new, highly refined psychoacoustical theory at our disposal. A veritable treasure trove of "stuff", applicable beyond the wildest dreams of even the 80's engineer's imagination, to the very problems we've always faced in trying to achieve verisimilitude of reproduction.

So let's at least draw from that history as best we can. First go with the ITU weighting where we want to measure noise. But of course then also acknowledge that it's a piece of *shit* compared to what we could now do, and as such task a doctoral student or two to derive something much better. ;)

In the case the issue is complicated by the EQ which is part of the A/B conversion. The W signal normally requires some boost in the high frequency range, how much depends on capsule directivity and the array radius.

It does, but as I said above, matching issues higher up the frequency range could lead to differencing, could lead to a high pass noise characteristic. Which could be picked up by our ears worse even in the W channel than you might first think, and especially in the XYZ+ channels higher up.

In particular because *nobody* has at least to my knowing applied directional unmasking theory to soundfield type mics or signal chains. They do that even now wrt Parametric Stereo, in MPEG perceptual audio coding work, but none of that hard, psychoacoustical, measured theory seems to really be translating back into the basic mic or other physical-electrical work as of now. It's all computational.

My original post was triggered by one of the various "this can perhaps be explained by" remarks in the web article - none of them make much sense IMHO.

Agreed. The objections pretty much sounded like undisciplined guessing or grasping for straws to me as well. The test setup seemed a bit unconventional as well, with speakers above the mid plane and whatnot.

However, Enda's work otherwise seems to me to be a bit more in the classical vein of experiential research. Not what you'd could call whackery or snakeoilmanship by any measure, but genuine striving for better sound, via disciplined empiricism. Informed by modern acoustical theory, of course, but maybe a bit less constrained by its central dogma than is usual.

I like it. I also believe that sort of approach is necessary, as an adjunct to the more theoretical minded research you and many others on-list and off do. If not else, then even because of what we've been talking about wrt the A/468-distinction above: we all know and agree that there are tradeoffs here, and we'd all like to understand them fully. We think we do, but then there are still surprises on the way; like the way neither of us can say too much, with too much certainty, about what the hell the quasi-peak machinery of 468 actually does or why. About what or how we could or should put in its place.

In that regard the qualitative, experiential research of Enda's kind is at least to me of the first importance. It serves as the first signal in the human science which psychoacoustics is, which leads to closer scrutiny, and eventually to a better physical-psychological correspondence in measurement. Especially since Enda does *not* just speculate willy-nilly, but quite evidently grounds his speculative musings into proper acoustical theory; as such gives rise to testable hypotheses, to be tried out by those of you in the hard, physical, empirical, measurement business.

Another thing which triggered my scepticism neurons is this 'timbre' evaluation of the various mics. Small differences of the 'dull vs bright' and 'thin vs full' kind can usually be corrected by some gentle EQ, so I really doubt if any of this is relevant in practice.

Exactly so. If I remember correctly, attempts at quantifying what people hear as different timbres, and the tests at quantifying the transparency of various transmission channels, *always* after factor analysis/PCA arrive at the same results: the principal component in the spectral domain consists of a nigh-linear spectral tilt, after compensation for some near-Weber-Fechner-law. Integrated over the whole of the human frequency passband, sensitivity to such average tilt is just ridiculously high, so that it for example tends to dominate loudspeaker and headphone preference to something like 1.1-1.3 sigma level.

(Sorry once again, I never, ever remember where I got my info; I'm clinically unable to remember any references, or faces, or numbers, or sometimes even my own name. So, take it with a grain of salt; it shouldn't be too difficult to find the relevant studies, given you prolly have access to all of the best periodicals already.)

The only real, attested to deviations from that idea/ideal of just the spectral tilt governing all, are 1) speech formant like characteristics,, i.e. waveguide-like resonances excited by near-periodic waveforms with some nonlinearity so as to not *just* "light up" the resonance using single harmonical series but having the excitation be a bit more spread out as it is in human speach, 2) the ridiculous sensitivity peak at 2-6kHz as attested to by the empirical ITU BS.648 transfer function; believe it or not, even to date it pretty much defies reduction to any basic psychoacoustical theory, and 3) the unreasonable efficiency of the human hearing system to react to wideband, binaural/dichotic onsets, and discern them beyond even high static noise backgrounds.

If you doubt me here, just read through the perceptual audio coding theory as a whole. All of the above has been explicitly taken advantage of, there. Fully? I dunno. Probably the last, time-domain thingy is at least a topic of contention. Especially since it has been cited as an explanation for why wide bandwidths in digital audio of over 25kHz (cf. ARA) could perhaps lead to better spatial resolution/spatiousness.

(BTW, Peter Craven seemed to provisionally buy into the argument, too. As one of the Ambisonic masterminds. He once put out an AES paper about the provisional benefits of minimum phase D/A reconstruction filters. I don't really buy into that theory *per se*, but just as Craven, given that we have extremely high sampling rates, arbitrary order digital filters and reasonable lossless compression algorithms readily available nowadays, I'd too advocate for wide bandwidths, slow rolloffs and perhaps even for minimum phase reconstruction filters.

Because, what would you really lose? Nothing in time or frequency at least, because of the *extreme* rates and filtering accuracies we currently have. What might we gain? Well, unconditional freedom from preringing. Which really *can*, at least in theory, be translated into something nonlinearly hearable, even via your common speaker or headphone. Thus, just to be sure...

More after I've read the AES papers.

I'd really like to see your interpretation of them.

Not to mention, they really should go into the Motherlode. Somehow, someone pirating them at their own peril, for communal benefit. I'm not the one to say *you* should be the one to betray your licence with your relevant publisher...except that I kind of am... ;)
--
Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
+358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.

Reply via email to