Re: [Sursound] Re Re: Ambisonic Mic Comparison

Sampo Syreeni Sun, 25 Jun 2017 22:32:18 -0700

On 2017-06-25, Fons Adriaensen wrote:

Could it be that you're just talking about different perceptualweightings? I mean, if we talk about noise, there we shouldn't evergo with A-weighting, or even C-weighting, but the ITU 468 curve.
Electrical noise from active mics and mic preamps is normally whiteand Gaussian, except at low frequencies where you will typically findsome 1/f noise. If that is not the case there is something seriouslywrong (but see remark about A/B conversion below).

Correct, but then soundfield type mics are compound beasts with thingslike differencing and capsule matching going on as well. Those can leadto issues foreign to the natural noise characteristic of any monolithicmic, like a highpass component on the noise floor of the first stageamplifier, and (guessing on theoretical grounds only since I'm nopractitioner like you) perhaps most crucially high frequency ripple inthe noise floor (too), because of capsule spacing and phasingimperfections.

The reason I brought up the ITU curve is that simply by reading from theresponse curve, such anomalies would be belittled by A-weighting withits rapid HF rolloff, while being accentuated by the brutal peak of theITU curve, smack in the center of the speech intelligibility band andreaching up from it to the 6kHz range where matching issues first startto show. At least with lower end soundfield kinda designs, with smallerdiaphragms and more, detailed, physical geometry between the capsules,instead of the more symmetrical, large diaphragm design of the likes ofthe original SoundFields.

(I'm grasping a bit here, but wasn't it so that the MkIV and MkV onlystarted to exhibit mismatch above 10kHz, where the ITU curve alreadyrapidly rolls off?)

Plus of course the ITU curve was designed with a noise measurement goalin mind, unlike the A-weighting and C-weighting ones (their onlydifference being in the reference SPL, both being linear weightingsderived from equal loudness contours of the Fletcher-Munson and laterRobinson-Dadson kind). I believe it really is Better wrt to what Endastarted with, on top of this thread.

But of course its derivation or at least its most common intendedapplication also leaves a lot to be desired. ITU-R BS.468-4 as it standsdoesn't really tell much about what is being measured or how, exceptthat it's noise at electrical signal levels, and that we measure itusing a particular analogue-implementable circuit consisting of apassive filter plus a quasi-peak detector. I.e. 1) the filter definitionitself is a remnant of analogue era gone by, and perhaps most saliently,2a) the reasoning behind the precise processing engendered by the filterand 2b) its proper ambit of application are *thoroughly* obscured. Mostimportantly we don't know whether the standard was meant to quantifynoise in an idle channel, by itself, or noise relative to a utilitysignal present at the same time -- a huge difference even when you applythe idea to analogue noise reduction, as the

it's been mostlyapplied to environmental noise pollution, where the noise is the utilitysignal under study, but also to the quantification of background noisein the presence of a utility signal, such as in noise reduction. Thosetwo scenarios aren't interchangeable by a long shot, even if the curveitself seems to perform rather better than its primitive elders in bothits roles; the extra nonlinear processing applied to the signal in thebase standard,

If the noise is white then it doesn't matter much which weightingfilter is used, as long as it is specified.

Doesn't it though, when you want to translate a purely objectivemeasurement into perceptual noisiness? I thought that was the veryessence of why all of the weighting curves were conceived in general, inthe first place, and the ITU one in particular. (Notwithstanding therest of the processing which goes along with the curve.) I mean,essentially any weighting curve is a frequency-wise decision of whatmatters and what does not; where the curve peaks, you'll have thefrequencies which most contribute to the aggregate noise figure, andwhere you have the most attenuation, you're essentially downgrading theimportance of noise over that band.

Given my argument from compoundness of spatial mic designs above, I'dargue a weighting which peaks around the frequencies which lead tosoundfield typical aberrations is more sensitive to the perceptualshortcomings of this class of mics.

Granted, I can't really argue that such a weighting would be betterexcept by half-coincidence: while the ITU-468 curve really was designedfor the measurement of confounding noise in the presence (implicitly?)of a utility signal, instead of audibility of signals as such like theA, C and whatnot weightings, and so is probably better in the role we'recurrently discussing in any case...on the other hand the perfectweighting and whatever extra nonlinear machinery we might deem necessaryin order to appraise directional mic accuracy probably *would* differbroadly from everything we have in our measurement toolkit right now.So, my invoking the ITU curve is just a half measure...but I think arelevant half-measure still.

Compared to a flat 20 kHz bandwidth, the A-filter will typically showaround 2 dB less, and ITU-468 will give around 11 dB more.

I'll take the mean as given. What I'd however be more interested in isthe relative variance, and in particular how it behaves over weightingand the binary condition of a mic being of soundfield type versus beinga high grade (thus flat as can be, on-axis as far as the mic has one)monophonic measurement mic of any kind. And of course in the end it'd benice to see such figures being correlated with rigorous perceptual work,MUSHRA's and all.

I know something like that is a tall order, and asking for somethingwhich *definitely* isn't available in the literature or the marketingbrochures right now. Also, I probably should have made my reasoning moreexplicit to begin with, and noted that it might -- as usual -- go a bitbroadside with regard to the original discussion. So, my apologies, asusual.

But it'd still be rather interesting to see what might come out of whatI mentioned above; I really don't think even the basic psychoacousticalmachinery of coincident mic design is too well developed as of date.Perhaps that idea of relative variance over a four-field of discretechoices, A-weighting/ITU-weighting, monophonic/periphonic, as correlatedto a MUSHRA score, might serve as a starting point for one more thesiswork? ;)

Most manufacturers provide A-weighted measurements, these are more orless the de-facto standard. Very few will give ITU-468 figures.


I'd argue, sadly so.

The one which peaks as fuck between 2-6kHz, and explains how thingslike Dolby B and C sliding band companders work so well; the onewhich also fails to explain the loudness of impulsive, nonstationary,nonlinear noise, yet.
Not sure what you mean by the latter. The ITU-468 method is quitesensitive to impulsive noise - it was designed to be.

Correct you are, once again. I often manage to to write precisely theopposite of what I was thinking about. I really should take more carewith my output and flights of thought (fancy?) -- as you've in factpointed out a couple of times in the past already.

I also should point out that as far as implicit noise figures of acoincident, spatial mic go, as here, I should have made it clearer thatI was referring only to the linear pre-equalization curve of the ITUstandard. Absent the further nonlinear detection machinery. As you saybelow, that part of the standard is a mess in its own right.

That is mainly not the result of the filter but of the very peculiarpseudo-peak detector specified by the standard.

Yes. That step is highly suspect in the literature, in itself, wrt allof the applications of the standard.

I think everybody will agree that something *like* it probably will needto be in the measurement signal chain, if we want to account for theparticularities of impulsive noise. It's just that the ITU pseudo-peakmachinery is old as fuck, constrained by primitive analogue circuitry,and quite possibly not too well thought out to begin with. Certainlythere's been as much critical commentary in the literature towards it asthere possibly could be, considering how peripheral to general audiopractice the field of noise metrology has over time proven to be.

Had I to guess, I'd also think we have a kind of anti-novelty bias atwork here: since few people know about the existence of the ITUstandard, as opposed to the more well known A-weighting and its ilk, wetend to think the ITU thingy is somehow much newer. We think it it mustsomehow represent a quantum leap over what came before, so that if itdoesn't, it must somehow be deficient.

In reality the ITU standard descends from 1970, which considering thehuge progress in audio theory and technology in recent decades seemspositively *ancient*. Sure, A-weighting was set in the thirties, so it'smuch older, set against the human lifespan and the inevitablegenerational shifts caused by it. But set against the background ofaccelerating and at the same time *highly* uneven technologicalprogress...

In audio reproduction, the seventies were then *much* akin to thethirties; especially on the practical side. Everybody already knew theircontinuous time wave equations and what could be derived from them, allof the analogue electronics equations were well know, and so on. Socompared to today, only one discrete revolution and one continuousevolution really have made a difference: digital technology/theexplosion of computing power to back it up as the revolutionary kind,and then materials plus modelling technology -- also speeded up by thecomputer aided, digital side of things -- as the slow, difficult tocomprehend evolution. That which finally lets us have speakers, mics,rooms, mixing consoles, whatever acoustical and electro-acousticalimplements, *finally* approximating the both the theoretical predictionsof pure acoustics, and the limits of psychoacoustical research, at thesame time.

So why would I rant as long as I have? Well, I think it's just idioticthat we have to rely on even the ITU curve-or-standard when we talkabout noise measures and their perceptual import. Even the cutting edgeexperimentalists and doers-shakers like you on the academic side andEnda perhaps slightly on the more practical one, still rely on stuffcoming from the 30's and at most 70's. That's just insane, because*especially* considering what we can now do with noise measurements,after the empirical work which went into the development of firstanalogue noise reduction plus especially after that into perceptualnoise shaping in A/D/A converters and lossy, digital, perceptual codecs,we have an *abundance* of new, highly refined psychoacoustical theory atour disposal. A veritable treasure trove of "stuff", applicable beyondthe wildest dreams of even the 80's engineer's imagination, to the veryproblems we've always faced in trying to achieve verisimilitude ofreproduction.

So let's at least draw from that history as best we can. First go withthe ITU weighting where we want to measure noise. But of course thenalso acknowledge that it's a piece of *shit* compared to what we couldnow do, and as such task a doctoral student or two to derive somethingmuch better. ;)

In the case the issue is complicated by the EQ which is part of theA/B conversion. The W signal normally requires some boost in the highfrequency range, how much depends on capsule directivity and the arrayradius.

It does, but as I said above, matching issues higher up the frequencyrange could lead to differencing, could lead to a high pass noisecharacteristic. Which could be picked up by our ears worse even in the Wchannel than you might first think, and especially in the XYZ+ channelshigher up.

In particular because *nobody* has at least to my knowing applieddirectional unmasking theory to soundfield type mics or signal chains.They do that even now wrt Parametric Stereo, in MPEG perceptual audiocoding work, but none of that hard, psychoacoustical, measured theoryseems to really be translating back into the basic mic or otherphysical-electrical work as of now. It's all computational.

My original post was triggered by one of the various "this can perhapsbe explained by" remarks in the web article - none of them make muchsense IMHO.

Agreed. The objections pretty much sounded like undisciplined guessingor grasping for straws to me as well. The test setup seemed a bitunconventional as well, with speakers above the mid plane and whatnot.

However, Enda's work otherwise seems to me to be a bit more in theclassical vein of experiential research. Not what you'd could callwhackery or snakeoilmanship by any measure, but genuine striving forbetter sound, via disciplined empiricism. Informed by modern acousticaltheory, of course, but maybe a bit less constrained by its central dogmathan is usual.

I like it. I also believe that sort of approach is necessary, as anadjunct to the more theoretical minded research you and many otherson-list and off do. If not else, then even because of what we've beentalking about wrt the A/468-distinction above: we all know and agreethat there are tradeoffs here, and we'd all like to understand themfully. We think we do, but then there are still surprises on the way;like the way neither of us can say too much, with too much certainty,about what the hell the quasi-peak machinery of 468 actually does orwhy. About what or how we could or should put in its place.

In that regard the qualitative, experiential research of Enda's kind isat least to me of the first importance. It serves as the first signal inthe human science which psychoacoustics is, which leads to closerscrutiny, and eventually to a better physical-psychologicalcorrespondence in measurement. Especially since Enda does *not* justspeculate willy-nilly, but quite evidently grounds his speculativemusings into proper acoustical theory; as such gives rise to testablehypotheses, to be tried out by those of you in the hard, physical,empirical, measurement business.

Another thing which triggered my scepticism neurons is this 'timbre'evaluation of the various mics. Small differences of the 'dull vsbright' and 'thin vs full' kind can usually be corrected by somegentle EQ, so I really doubt if any of this is relevant in practice.

Exactly so. If I remember correctly, attempts at quantifying what peoplehear as different timbres, and the tests at quantifying the transparencyof various transmission channels, *always* after factor analysis/PCAarrive at the same results: the principal component in the spectraldomain consists of a nigh-linear spectral tilt, after compensation forsome near-Weber-Fechner-law. Integrated over the whole of the humanfrequency passband, sensitivity to such average tilt is justridiculously high, so that it for example tends to dominate loudspeakerand headphone preference to something like 1.1-1.3 sigma level.

(Sorry once again, I never, ever remember where I got my info; I'mclinically unable to remember any references, or faces, or numbers, orsometimes even my own name. So, take it with a grain of salt; itshouldn't be too difficult to find the relevant studies, given youprolly have access to all of the best periodicals already.)

The only real, attested to deviations from that idea/ideal of just thespectral tilt governing all, are 1) speech formant likecharacteristics,, i.e. waveguide-like resonances excited bynear-periodic waveforms with some nonlinearity so as to not *just*"light up" the resonance using single harmonical series but having theexcitation be a bit more spread out as it is in human speach, 2) theridiculous sensitivity peak at 2-6kHz as attested to by the empiricalITU BS.648 transfer function; believe it or not, even to date it prettymuch defies reduction to any basic psychoacoustical theory, and 3) theunreasonable efficiency of the human hearing system to react towideband, binaural/dichotic onsets, and discern them beyond even highstatic noise backgrounds.

If you doubt me here, just read through the perceptual audio codingtheory as a whole. All of the above has been explicitly taken advantageof, there. Fully? I dunno. Probably the last, time-domain thingy is atleast a topic of contention. Especially since it has been cited as anexplanation for why wide bandwidths in digital audio of over 25kHz (cf.ARA) could perhaps lead to better spatial resolution/spatiousness.

(BTW, Peter Craven seemed to provisionally buy into the argument, too.As one of the Ambisonic masterminds. He once put out an AES paper aboutthe provisional benefits of minimum phase D/A reconstruction filters. Idon't really buy into that theory *per se*, but just as Craven, giventhat we have extremely high sampling rates, arbitrary order digitalfilters and reasonable lossless compression algorithms readily availablenowadays, I'd too advocate for wide bandwidths, slow rolloffs andperhaps even for minimum phase reconstruction filters.

Because, what would you really lose? Nothing in time or frequency atleast, because of the *extreme* rates and filtering accuracies wecurrently have. What might we gain? Well, unconditional freedom frompreringing. Which really *can*, at least in theory, be translated intosomething nonlinearly hearable, even via your common speaker orheadphone. Thus, just to be sure...

More after I've read the AES papers.


I'd really like to see your interpretation of them.

Not to mention, they really should go into the Motherlode. Somehow,someone pirating them at their own peril, for communal benefit. I'm notthe one to say *you* should be the one to betray your licence with yourrelevant publisher...except that I kind of am... ;)

--
Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
+358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.

Re: [Sursound] Re Re: Ambisonic Mic Comparison

Reply via email to