[Sursound] Never do math in public, or my take on explaining B-format to binaural

Jörn Nettingsmeier Tue, 26 Jan 2016 12:56:18 -0800

On 01/26/2016 06:36 PM, Stefan Schreiber wrote:

2. < 8 > impulses (for 4 virtual speakers) implies that you don't
support 3D decoders (?). If not, why this? (Immersive/3D audio is on the
requirement list for VR. It wouldn't make a lot of sense if all sound
sources will follow your gaze - looking upwards or downwards.)

I think the 8 impulses are used differently. I'm scared of trying toexplain something of which my own understanding is somewhat hazy, buthere it goes: please correct me ruthlessly. Even if in the end I wishI'd never been born, there might be something to learn from theresulting discussion :)


W goes to loudspeaker LS1, LS2, ..., LSn.
Same for X, Y, and Z.

Each LSn then goes both to left ear and right ear.

So you start with a 4 to n matrix, feeding into an n to 2 matrix. Thecomponent-to-speaker convolutions and the speaker-to-ear convolutions(the HRTFs) are constant.

Convolution and mixing are both linear, time-invariant operations. Thatmeans they can be performed in any order and the result will beidentical. I guess in math terms they are transitive and associative, sothat (a # X) + (b # X) is the same as (a + b) # X, and a # b # c is thesame as a # (b # c), where "#" means convolution.

So the convolution steps can be pre-computed as follows, where DEC(N,m)is the decoding coefficient of component N to loudspeaker m, expressedas convolution with a dirac pulse of the appropriate value:


L = W # DEC(W,LS1) # HRTF(L,LS1) + ... + W # DEC(W,LSn) # HRTF(L,LSn)
  + X # DEC(X,LS1) # HRTF(L,LS1) + ... + X # DEC(X,LSn) # HRTF(L,LSn)
  + Y # ...
  + Z # ...

(same for R)

which can be expressed as

L = W # ( (DEC(W,LS1) # HRTF(L,LS1) + ... + DEC(W,LSn) # HRTF(L,LSn) )
  + X # ...
  + Y # ...
  + Z # ...

(same for R).

Note that everything in brackets is now constant and can be folded intoa single convolution kernel.

That means you can, for first order, reduce the problem to 8convolutions, going from {WXYZ} to {LR} directly. The complexity isconstant no matter how many virtual loudspeakers you use.

Of course, that does not take into account dual-band decoding. But if weexpress the cross-over filters as another convolution and split thedecoding matrix into a hf and lf part, we can also throw both halves ofthe decoder together and do everything in one go.


For nth order, you have (n-1)² * 2 convolutions to handle.

For head-tracking, the virtual loudspeakers would move with the head (sothat we don't have to swap HRTFs), and the Ambisonic signal would becounter-rotated accordingly. Of course that gets the torso reflectionsslightly wrong as it assumes the whole upper body moves, rather thanjust the neck, but I guess it's a start.




--
Jörn Nettingsmeier
Lortzingstr. 11, 45128 Essen, Tel. +49 177 7937487

Meister für Veranstaltungstechnik (Bühne/Studio)
Tonmeister VDT

http://stackingdwarves.net

_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.

[Sursound] Never do math in public, or my take on explaining B-format to binaural

Reply via email to