On 01/26/2016 06:36 PM, Stefan Schreiber wrote:
2. < 8 > impulses (for 4 virtual speakers) implies that you don't
support 3D decoders (?). If not, why this? (Immersive/3D audio is on the
requirement list for VR. It wouldn't make a lot of sense if all sound
sources will follow your gaze - looking upwards or downwards.)
I think the 8 impulses are used differently. I'm scared of trying to
explain something of which my own understanding is somewhat hazy, but
here it goes: please correct me ruthlessly. Even if in the end I wish
I'd never been born, there might be something to learn from the
resulting discussion :)
W goes to loudspeaker LS1, LS2, ..., LSn.
Same for X, Y, and Z.
Each LSn then goes both to left ear and right ear.
So you start with a 4 to n matrix, feeding into an n to 2 matrix. The
component-to-speaker convolutions and the speaker-to-ear convolutions
(the HRTFs) are constant.
Convolution and mixing are both linear, time-invariant operations. That
means they can be performed in any order and the result will be
identical. I guess in math terms they are transitive and associative, so
that (a # X) + (b # X) is the same as (a + b) # X, and a # b # c is the
same as a # (b # c), where "#" means convolution.
So the convolution steps can be pre-computed as follows, where DEC(N,m)
is the decoding coefficient of component N to loudspeaker m, expressed
as convolution with a dirac pulse of the appropriate value:
L = W # DEC(W,LS1) # HRTF(L,LS1) + ... + W # DEC(W,LSn) # HRTF(L,LSn)
+ X # DEC(X,LS1) # HRTF(L,LS1) + ... + X # DEC(X,LSn) # HRTF(L,LSn)
+ Y # ...
+ Z # ...
(same for R)
which can be expressed as
L = W # ( (DEC(W,LS1) # HRTF(L,LS1) + ... + DEC(W,LSn) # HRTF(L,LSn) )
+ X # ...
+ Y # ...
+ Z # ...
(same for R).
Note that everything in brackets is now constant and can be folded into
a single convolution kernel.
That means you can, for first order, reduce the problem to 8
convolutions, going from {WXYZ} to {LR} directly. The complexity is
constant no matter how many virtual loudspeakers you use.
Of course, that does not take into account dual-band decoding. But if we
express the cross-over filters as another convolution and split the
decoding matrix into a hf and lf part, we can also throw both halves of
the decoder together and do everything in one go.
For nth order, you have (n-1)² * 2 convolutions to handle.
For head-tracking, the virtual loudspeakers would move with the head (so
that we don't have to swap HRTFs), and the Ambisonic signal would be
counter-rotated accordingly. Of course that gets the torso reflections
slightly wrong as it assumes the whole upper body moves, rather than
just the neck, but I guess it's a start.
--
Jörn Nettingsmeier
Lortzingstr. 11, 45128 Essen, Tel. +49 177 7937487
Meister für Veranstaltungstechnik (Bühne/Studio)
Tonmeister VDT
http://stackingdwarves.net
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit
account or options, view archives and so on.