On 2017-01-09, Politis Archontis wrote:

I am a bit baffled by the idea that VBAP is not compatible with Ambisonics theory (?)

I actually didn't mean to say quite as much. :)

Thinking in terms of velocity and energy vectors, as far as I understand, VBAP with the (classic) amplitude panning formulation has zero angular error for the (Makita) velocity vectors for all directions.

Yes. In my mind the trouble with VBAP isn't that it's somehow incorrect, because obviously it isn't. It eminently does work. But reflected against ambisonic theory, it also has its shortcomings.

The closest thing in POA (plain old ambisonic) theory to VBAP is, I believe, in-phase decoding. It's not an exact fit, true, but it comes close enough to make a comparison: in-phase basically means that whatever you put into the soundfield comes from a certain direction, with no anti-phase, oscillating (in the sound intensity theory sense) contribution from around the rig. It tries to keep all of the energy as travelling, not oscillating, or in other words it tries to keep close to what the more involved NFC-HOA analysis calls an "inbound solution". So, pretty much what pure amplitude panning like VBAP does in practice as well.

With that in mind, you can immediately see how the idea fails the ambisonic ideal. Perhaps not by too much, and we know that all of the solutions tend toward the same holophonic limit given enough resources. But still in the low order sparse rig case there's a difference. Which is why we don't do in-phase decodes, but max rE at HF and max rV at LF.

Namely, two things. First, the ambisonic formulation tries as hard as it can to be isotropic. That's the basic reason why you need at least four speakers for even three channel, pantophonic POA: you just can't make the system so that it doesn't pull sound into speakers with just three. VBAP doesn't respect that basic theorem, and so it does pull into speakers; it doesn't sound the same when you pan into a speaker position, as it does when you pan between. POA does (or at least tries very hard to do so).

And secondly, VBAP doesn't utilize the whole (limited, cheap, basic) speaker rig as efficiently as POA does. The classical ambisonic theory a la Gerzon starts with the Makita theory of localisation, and optimizes against it. VBAP doesn't take heed of that or any other theory, but is in the perceptual sense an ad hoc machinery. Thus, it doesn't really optimize for hearing, or the use of limited speaker resources; it doesn't do what POA does, which is to utilize anti-phase signals in order to give higher location accuracy at LF. (Remember, those go away at the dense HOA limit even within the ambisonic framework; but we're not talking about that in usual home configurations; the cheap, basic setup every homeowner has is the thing, and the thing where VBAP performs worst as compared to POA.)

In essence, if you want to put VBAP within the ambisonic framework, I'd characterize it as being "an infinite order single band decoder optimized for in-phase propagation, without the isotropy constraints which characterize low order ambisonic". There's nothing inherently wrong with such a decoding solution, and even the ambisonic theoretical machinery tells you that such a solution is sometimes ideal. It's just that if you work with your average 4-5 speakers, and within POA's single listener, central, isotropic assumption, the classical POA framework does even better.

Of course at low frequencies you cannot achieve the “perfect” pressure reconstruction that a mode-matching decoder can achieve, but then you see what are the gains that such a decoder imposes on not ideal regular setups to realize that perfect reconstruction should be compromised anyway with some more practical solution.

If I understand you correctly, we sort of agree. But you see, my argument is very much about the *imperfect* case, and I think the classical Gerzon/POA theory is about that too. If we had a million speakers, too much processing power to speak of, and so on, all of this discussion would be moot. It's just that we don't have that. We typically have just four speakers (if even as many), and we have to make the best of what little we have.

That then leads to Gerzon's theory (of POA); something which is almost singlemindedly psychoacoustical (within the constraint of an LTI signal chain). Nowadays we could theoretize about tons of speakers and the high holophonic limit, but in his day Gerzon worked with pretty much just a quadraphonic setup. If you want to have something like that deliver passable pantophony, you can't go with idealisations. You work with what you got, and what you got was part psychoacoustics.

Which is why we have shelf filters, which cut from rE to rV, and which is why in POA's presumable use case, it does (as much as the underlying Makita theory of localization can help you) far better than VBAP. With the limited resources we have at our disposal.

Again I think it depends how you mean it - VBAP will just work for any speaker array with a performance limited by the setup in a quite intuitive understandable way (large spread for large triangle apertures, full concentration at a speaker direction, nothing for regions outside a partial setup etc..).

Indeed VBAP works for anything. That's its strength, above ambisonic, with its rather limited decoder theory.

But when you can derive a decoding solution, it doesn't just work. It works the array fully, and in a fashion that is psychoacoustically optimized (for a passive LTI decoder). VBAP doesn't do that, because it hasn't been optimized like POA has. It leaves much of the potential of the rig unused.

Ambisonic decoding for any array is not designed as easily as computing VBAP gains, and it seems for irregular setups, one of the most straightforward and practical ways to do it is to combine the properties of VBAP and Ambisonic decoding (as the work of Zotter, Batke, and Epain have shown).

Agreed. The way I conceptualize ambisonic nowadays is that it's a framework and a way of thinking about directional audio. Not so much any precise system. If you think about it a bit, ambisonic in its full generality is about looking at acoustic fields from the viewpoint of a spherical surface harmonic decomposition; it's capable of describing any and all acoustic fields; it's just a viewpoint into acoustics, general. Because of that, VBAP and any other conceivable acoustic thingy can be folded into the framework. By definition. That's why I for instance speak about VBAP as an "infinite order decoder" of particular sort; I try to bring even that particular panning law within the general fold that is ambisonic.

It's then just that the framework is a bit more as well. It's also a theory of how to do more with less resources. While something like the NFC-HOA work (rather successfully) tries to perfect the periphonic, holophonic, infinite order limit of the theory (even with parallax, not with just direction), the POA/Gerzon theory also tells you what to do with the very minimum of resources. It takes from what it has, like the idealized Makita theory of human directional hearing, and then it runs with it. Quite without resorting to idealizations like "yeah, I work for IRCAM, and I have next to a thousand speakers at my disposal". ;)

Considering panning specifically, I think it depends on the application what works best, for VR or interactive-audio stuff for example, where normally sound objects would be rendered with maximum sharpness VBAP would work better.

Are you sure? Wouldn't you think, as Gerzon did and I do, that isotropy is important? That it would be nastily distracting from your experience if the soundfield "pulled into a speaker". I.e. if sounds coming from different directions were somehow different, because of your speaker setup?

If you do think that would be distracting, and counter to a nice spatial audio experience, suddenly you can't go with VBAP. Because it does pull to speakers. The only system and theory which does not really is ambisonic. ;)

Otherwise we of course agree. VBAP really does produce sharp images of the propagating kind. And it is easy as fuck to apply to irregular geometry. So, it has its merits. It's just that... ;)

If however some and more even directional spreading is preferred, then ambisonic panning should be better, or some VBAP variant with spreading as has been presented by Ville and others.

I believe that work is of a different vein. It's about plausible presentation of sound objects, which necessitates a certain amount of spreading, reverb, whatnot. But that doesn't really pertain to the basic theory of ambisonic, or any other "pure"/"theoretical" transfer or representation or playback format. I believe that's more about synthesis, and less about how to faithfully represent the result of said synthesis -- as I believe ambisonic above all tries to do.
--
Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
+358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.

Reply via email to