On 01/26/2016 06:36 PM, Stefan Schreiber wrote:

2. < 8 > impulses (for 4 virtual speakers) implies that you don't
support 3D decoders (?). If not, why this? (Immersive/3D audio is on the
requirement list for VR. It wouldn't make a lot of sense if all sound
sources will follow your gaze - looking upwards or downwards.)

I think the 8 impulses are used differently. I'm scared of trying to explain something of which my own understanding is somewhat hazy, but here it goes: please correct me ruthlessly. Even if in the end I wish I'd never been born, there might be something to learn from the resulting discussion :)

W goes to loudspeaker LS1, LS2, ..., LSn.
Same for X, Y, and Z.

Each LSn then goes both to left ear and right ear.

So you start with a 4 to n matrix, feeding into an n to 2 matrix. The component-to-speaker convolutions and the speaker-to-ear convolutions (the HRTFs) are constant.

Convolution and mixing are both linear, time-invariant operations. That means they can be performed in any order and the result will be identical. I guess in math terms they are transitive and associative, so that (a # X) + (b # X) is the same as (a + b) # X, and a # b # c is the same as a # (b # c), where "#" means convolution.

So the convolution steps can be pre-computed as follows, where DEC(N,m) is the decoding coefficient of component N to loudspeaker m, expressed as convolution with a dirac pulse of the appropriate value:

L = W # DEC(W,LS1) # HRTF(L,LS1) + ... + W # DEC(W,LSn) # HRTF(L,LSn)
  + X # DEC(X,LS1) # HRTF(L,LS1) + ... + X # DEC(X,LSn) # HRTF(L,LSn)
  + Y # ...
  + Z # ...

(same for R)

which can be expressed as

L = W # ( (DEC(W,LS1) # HRTF(L,LS1) + ... + DEC(W,LSn) # HRTF(L,LSn) )
  + X # ...
  + Y # ...
  + Z # ...

(same for R).

Note that everything in brackets is now constant and can be folded into a single convolution kernel.

That means you can, for first order, reduce the problem to 8 convolutions, going from {WXYZ} to {LR} directly. The complexity is constant no matter how many virtual loudspeakers you use.

Of course, that does not take into account dual-band decoding. But if we express the cross-over filters as another convolution and split the decoding matrix into a hf and lf part, we can also throw both halves of the decoder together and do everything in one go.

For nth order, you have (n-1)² * 2 convolutions to handle.

For head-tracking, the virtual loudspeakers would move with the head (so that we don't have to swap HRTFs), and the Ambisonic signal would be counter-rotated accordingly. Of course that gets the torso reflections slightly wrong as it assumes the whole upper body moves, rather than just the neck, but I guess it's a start.



--
Jörn Nettingsmeier
Lortzingstr. 11, 45128 Essen, Tel. +49 177 7937487

Meister für Veranstaltungstechnik (Bühne/Studio)
Tonmeister VDT

http://stackingdwarves.net

_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.

Reply via email to