On 2017-06-25, Fons Adriaensen wrote:
Could it be that you're just talking about different perceptual
weightings? I mean, if we talk about noise, there we shouldn't ever
go with A-weighting, or even C-weighting, but the ITU 468 curve.
Electrical noise from active mics and mic preamps is normally white
and Gaussian, except at low frequencies where you will typically find
some 1/f noise. If that is not the case there is something seriously
wrong (but see remark about A/B conversion below).
Correct, but then soundfield type mics are compound beasts with things
like differencing and capsule matching going on as well. Those can lead
to issues foreign to the natural noise characteristic of any monolithic
mic, like a highpass component on the noise floor of the first stage
amplifier, and (guessing on theoretical grounds only since I'm no
practitioner like you) perhaps most crucially high frequency ripple in
the noise floor (too), because of capsule spacing and phasing
imperfections.
The reason I brought up the ITU curve is that simply by reading from the
response curve, such anomalies would be belittled by A-weighting with
its rapid HF rolloff, while being accentuated by the brutal peak of the
ITU curve, smack in the center of the speech intelligibility band and
reaching up from it to the 6kHz range where matching issues first start
to show. At least with lower end soundfield kinda designs, with smaller
diaphragms and more, detailed, physical geometry between the capsules,
instead of the more symmetrical, large diaphragm design of the likes of
the original SoundFields.
(I'm grasping a bit here, but wasn't it so that the MkIV and MkV only
started to exhibit mismatch above 10kHz, where the ITU curve already
rapidly rolls off?)
Plus of course the ITU curve was designed with a noise measurement goal
in mind, unlike the A-weighting and C-weighting ones (their only
difference being in the reference SPL, both being linear weightings
derived from equal loudness contours of the Fletcher-Munson and later
Robinson-Dadson kind). I believe it really is Better wrt to what Enda
started with, on top of this thread.
But of course its derivation or at least its most common intended
application also leaves a lot to be desired. ITU-R BS.468-4 as it stands
doesn't really tell much about what is being measured or how, except
that it's noise at electrical signal levels, and that we measure it
using a particular analogue-implementable circuit consisting of a
passive filter plus a quasi-peak detector. I.e. 1) the filter definition
itself is a remnant of analogue era gone by, and perhaps most saliently,
2a) the reasoning behind the precise processing engendered by the filter
and 2b) its proper ambit of application are *thoroughly* obscured. Most
importantly we don't know whether the standard was meant to quantify
noise in an idle channel, by itself, or noise relative to a utility
signal present at the same time -- a huge difference even when you apply
the idea to analogue noise reduction, as the
it's been mostly
applied to environmental noise pollution, where the noise is the utility
signal under study, but also to the quantification of background noise
in the presence of a utility signal, such as in noise reduction. Those
two scenarios aren't interchangeable by a long shot, even if the curve
itself seems to perform rather better than its primitive elders in both
its roles; the extra nonlinear processing applied to the signal in the
base standard,
If the noise is white then it doesn't matter much which weighting
filter is used, as long as it is specified.
Doesn't it though, when you want to translate a purely objective
measurement into perceptual noisiness? I thought that was the very
essence of why all of the weighting curves were conceived in general, in
the first place, and the ITU one in particular. (Notwithstanding the
rest of the processing which goes along with the curve.) I mean,
essentially any weighting curve is a frequency-wise decision of what
matters and what does not; where the curve peaks, you'll have the
frequencies which most contribute to the aggregate noise figure, and
where you have the most attenuation, you're essentially downgrading the
importance of noise over that band.
Given my argument from compoundness of spatial mic designs above, I'd
argue a weighting which peaks around the frequencies which lead to
soundfield typical aberrations is more sensitive to the perceptual
shortcomings of this class of mics.
Granted, I can't really argue that such a weighting would be better
except by half-coincidence: while the ITU-468 curve really was designed
for the measurement of confounding noise in the presence (implicitly?)
of a utility signal, instead of audibility of signals as such like the
A, C and whatnot weightings, and so is probably better in the role we're
currently discussing in any case...on the other hand the perfect
weighting and whatever extra nonlinear machinery we might deem necessary
in order to appraise directional mic accuracy probably *would* differ
broadly from everything we have in our measurement toolkit right now.
So, my invoking the ITU curve is just a half measure...but I think a
relevant half-measure still.
Compared to a flat 20 kHz bandwidth, the A-filter will typically show
around 2 dB less, and ITU-468 will give around 11 dB more.
I'll take the mean as given. What I'd however be more interested in is
the relative variance, and in particular how it behaves over weighting
and the binary condition of a mic being of soundfield type versus being
a high grade (thus flat as can be, on-axis as far as the mic has one)
monophonic measurement mic of any kind. And of course in the end it'd be
nice to see such figures being correlated with rigorous perceptual work,
MUSHRA's and all.
I know something like that is a tall order, and asking for something
which *definitely* isn't available in the literature or the marketing
brochures right now. Also, I probably should have made my reasoning more
explicit to begin with, and noted that it might -- as usual -- go a bit
broadside with regard to the original discussion. So, my apologies, as
usual.
But it'd still be rather interesting to see what might come out of what
I mentioned above; I really don't think even the basic psychoacoustical
machinery of coincident mic design is too well developed as of date.
Perhaps that idea of relative variance over a four-field of discrete
choices, A-weighting/ITU-weighting, monophonic/periphonic, as correlated
to a MUSHRA score, might serve as a starting point for one more thesis
work? ;)
Most manufacturers provide A-weighted measurements, these are more or
less the de-facto standard. Very few will give ITU-468 figures.
I'd argue, sadly so.
The one which peaks as fuck between 2-6kHz, and explains how things
like Dolby B and C sliding band companders work so well; the one
which also fails to explain the loudness of impulsive, nonstationary,
nonlinear noise, yet.
Not sure what you mean by the latter. The ITU-468 method is quite
sensitive to impulsive noise - it was designed to be.
Correct you are, once again. I often manage to to write precisely the
opposite of what I was thinking about. I really should take more care
with my output and flights of thought (fancy?) -- as you've in fact
pointed out a couple of times in the past already.
I also should point out that as far as implicit noise figures of a
coincident, spatial mic go, as here, I should have made it clearer that
I was referring only to the linear pre-equalization curve of the ITU
standard. Absent the further nonlinear detection machinery. As you say
below, that part of the standard is a mess in its own right.
That is mainly not the result of the filter but of the very peculiar
pseudo-peak detector specified by the standard.
Yes. That step is highly suspect in the literature, in itself, wrt all
of the applications of the standard.
I think everybody will agree that something *like* it probably will need
to be in the measurement signal chain, if we want to account for the
particularities of impulsive noise. It's just that the ITU pseudo-peak
machinery is old as fuck, constrained by primitive analogue circuitry,
and quite possibly not too well thought out to begin with. Certainly
there's been as much critical commentary in the literature towards it as
there possibly could be, considering how peripheral to general audio
practice the field of noise metrology has over time proven to be.
Had I to guess, I'd also think we have a kind of anti-novelty bias at
work here: since few people know about the existence of the ITU
standard, as opposed to the more well known A-weighting and its ilk, we
tend to think the ITU thingy is somehow much newer. We think it it must
somehow represent a quantum leap over what came before, so that if it
doesn't, it must somehow be deficient.
In reality the ITU standard descends from 1970, which considering the
huge progress in audio theory and technology in recent decades seems
positively *ancient*. Sure, A-weighting was set in the thirties, so it's
much older, set against the human lifespan and the inevitable
generational shifts caused by it. But set against the background of
accelerating and at the same time *highly* uneven technological
progress...
In audio reproduction, the seventies were then *much* akin to the
thirties; especially on the practical side. Everybody already knew their
continuous time wave equations and what could be derived from them, all
of the analogue electronics equations were well know, and so on. So
compared to today, only one discrete revolution and one continuous
evolution really have made a difference: digital technology/the
explosion of computing power to back it up as the revolutionary kind,
and then materials plus modelling technology -- also speeded up by the
computer aided, digital side of things -- as the slow, difficult to
comprehend evolution. That which finally lets us have speakers, mics,
rooms, mixing consoles, whatever acoustical and electro-acoustical
implements, *finally* approximating the both the theoretical predictions
of pure acoustics, and the limits of psychoacoustical research, at the
same time.
So why would I rant as long as I have? Well, I think it's just idiotic
that we have to rely on even the ITU curve-or-standard when we talk
about noise measures and their perceptual import. Even the cutting edge
experimentalists and doers-shakers like you on the academic side and
Enda perhaps slightly on the more practical one, still rely on stuff
coming from the 30's and at most 70's. That's just insane, because
*especially* considering what we can now do with noise measurements,
after the empirical work which went into the development of first
analogue noise reduction plus especially after that into perceptual
noise shaping in A/D/A converters and lossy, digital, perceptual codecs,
we have an *abundance* of new, highly refined psychoacoustical theory at
our disposal. A veritable treasure trove of "stuff", applicable beyond
the wildest dreams of even the 80's engineer's imagination, to the very
problems we've always faced in trying to achieve verisimilitude of
reproduction.
So let's at least draw from that history as best we can. First go with
the ITU weighting where we want to measure noise. But of course then
also acknowledge that it's a piece of *shit* compared to what we could
now do, and as such task a doctoral student or two to derive something
much better. ;)
In the case the issue is complicated by the EQ which is part of the
A/B conversion. The W signal normally requires some boost in the high
frequency range, how much depends on capsule directivity and the array
radius.
It does, but as I said above, matching issues higher up the frequency
range could lead to differencing, could lead to a high pass noise
characteristic. Which could be picked up by our ears worse even in the W
channel than you might first think, and especially in the XYZ+ channels
higher up.
In particular because *nobody* has at least to my knowing applied
directional unmasking theory to soundfield type mics or signal chains.
They do that even now wrt Parametric Stereo, in MPEG perceptual audio
coding work, but none of that hard, psychoacoustical, measured theory
seems to really be translating back into the basic mic or other
physical-electrical work as of now. It's all computational.
My original post was triggered by one of the various "this can perhaps
be explained by" remarks in the web article - none of them make much
sense IMHO.
Agreed. The objections pretty much sounded like undisciplined guessing
or grasping for straws to me as well. The test setup seemed a bit
unconventional as well, with speakers above the mid plane and whatnot.
However, Enda's work otherwise seems to me to be a bit more in the
classical vein of experiential research. Not what you'd could call
whackery or snakeoilmanship by any measure, but genuine striving for
better sound, via disciplined empiricism. Informed by modern acoustical
theory, of course, but maybe a bit less constrained by its central dogma
than is usual.
I like it. I also believe that sort of approach is necessary, as an
adjunct to the more theoretical minded research you and many others
on-list and off do. If not else, then even because of what we've been
talking about wrt the A/468-distinction above: we all know and agree
that there are tradeoffs here, and we'd all like to understand them
fully. We think we do, but then there are still surprises on the way;
like the way neither of us can say too much, with too much certainty,
about what the hell the quasi-peak machinery of 468 actually does or
why. About what or how we could or should put in its place.
In that regard the qualitative, experiential research of Enda's kind is
at least to me of the first importance. It serves as the first signal in
the human science which psychoacoustics is, which leads to closer
scrutiny, and eventually to a better physical-psychological
correspondence in measurement. Especially since Enda does *not* just
speculate willy-nilly, but quite evidently grounds his speculative
musings into proper acoustical theory; as such gives rise to testable
hypotheses, to be tried out by those of you in the hard, physical,
empirical, measurement business.
Another thing which triggered my scepticism neurons is this 'timbre'
evaluation of the various mics. Small differences of the 'dull vs
bright' and 'thin vs full' kind can usually be corrected by some
gentle EQ, so I really doubt if any of this is relevant in practice.
Exactly so. If I remember correctly, attempts at quantifying what people
hear as different timbres, and the tests at quantifying the transparency
of various transmission channels, *always* after factor analysis/PCA
arrive at the same results: the principal component in the spectral
domain consists of a nigh-linear spectral tilt, after compensation for
some near-Weber-Fechner-law. Integrated over the whole of the human
frequency passband, sensitivity to such average tilt is just
ridiculously high, so that it for example tends to dominate loudspeaker
and headphone preference to something like 1.1-1.3 sigma level.
(Sorry once again, I never, ever remember where I got my info; I'm
clinically unable to remember any references, or faces, or numbers, or
sometimes even my own name. So, take it with a grain of salt; it
shouldn't be too difficult to find the relevant studies, given you
prolly have access to all of the best periodicals already.)
The only real, attested to deviations from that idea/ideal of just the
spectral tilt governing all, are 1) speech formant like
characteristics,, i.e. waveguide-like resonances excited by
near-periodic waveforms with some nonlinearity so as to not *just*
"light up" the resonance using single harmonical series but having the
excitation be a bit more spread out as it is in human speach, 2) the
ridiculous sensitivity peak at 2-6kHz as attested to by the empirical
ITU BS.648 transfer function; believe it or not, even to date it pretty
much defies reduction to any basic psychoacoustical theory, and 3) the
unreasonable efficiency of the human hearing system to react to
wideband, binaural/dichotic onsets, and discern them beyond even high
static noise backgrounds.
If you doubt me here, just read through the perceptual audio coding
theory as a whole. All of the above has been explicitly taken advantage
of, there. Fully? I dunno. Probably the last, time-domain thingy is at
least a topic of contention. Especially since it has been cited as an
explanation for why wide bandwidths in digital audio of over 25kHz (cf.
ARA) could perhaps lead to better spatial resolution/spatiousness.
(BTW, Peter Craven seemed to provisionally buy into the argument, too.
As one of the Ambisonic masterminds. He once put out an AES paper about
the provisional benefits of minimum phase D/A reconstruction filters. I
don't really buy into that theory *per se*, but just as Craven, given
that we have extremely high sampling rates, arbitrary order digital
filters and reasonable lossless compression algorithms readily available
nowadays, I'd too advocate for wide bandwidths, slow rolloffs and
perhaps even for minimum phase reconstruction filters.
Because, what would you really lose? Nothing in time or frequency at
least, because of the *extreme* rates and filtering accuracies we
currently have. What might we gain? Well, unconditional freedom from
preringing. Which really *can*, at least in theory, be translated into
something nonlinearly hearable, even via your common speaker or
headphone. Thus, just to be sure...
More after I've read the AES papers.
I'd really like to see your interpretation of them.
Not to mention, they really should go into the Motherlode. Somehow,
someone pirating them at their own peril, for communal benefit. I'm not
the one to say *you* should be the one to betray your licence with your
relevant publisher...except that I kind of am... ;)
--
Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
+358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit
account or options, view archives and so on.