Re: [Sursound] Never do math in public, or my take on explaining B-format to binaural

Dave Malham Wed, 27 Jan 2016 23:29:25 -0800

It's a pity MAG wasn't around to work on this - I know he had already
thought about it in the eighties (and probably early) as he mentioned it in
one of the discussions we had at one of the APRS or AES convention. Mind
you, I say "discussions" but I never contributed that much as my brain was
generally heading for melt down keeping up with him after 5 minutes. :-)


    Dave

On 28 January 2016 at 01:41, Stefan Schreiber <st...@mail.telepac.pt> wrote:

> Politis Archontis wrote:
>
> Hi Jorn,
>>
>> yes that is correct. I think however that the virtual loudspeaker stage
>> is unnecessary. It is equivalent if you expand the left and right HRTFs
>> into spherical harmonics and multiply their coefficients (in the frequency
>> domain) directly with the coefficients of the sound scene (which in the
>> 1st-order case is the B-format recording). This is simpler and more elegant
>> I think. Taking the IFFT of each coefficient of the HRTFs, you end up with
>> an FIR filter that maps the respective HOA signal to its binaural output,
>> hence as you said it's always 2*(HOA channels) no matter what. Arbitrary
>> rotations can be done on the HOA signals before the HOA-to-binaural
>> filters, so head-tracking is perfectly possible.
>>
>> Best,
>> Archontis
>>
>>
>
> Yes - I believe the Mpeg does about this to implement their direct H2B
> (HOA ---> binaural) decoder, mandatory in Mpeg-H 3DA (ISO/IEC 23008-3).
>
> Gregory Pallone could comment on this, I guess...   (Rozenn Nicol's papers
> cited by several persons in this thread is related to Orange Labs, so
> thanks for their good and often publicly available research!)
>
> Best,
>
> Stefan
>
>
> ________________________________________
>> From: Sursound [sursound-boun...@music.vt.edu] on behalf of Jörn
>> Nettingsmeier [netti...@stackingdwarves.net]
>> Sent: 26 January 2016 22:52
>> To: sursound@music.vt.edu
>> Subject: [Sursound] Never do math in public, or my take on explaining
>> B-format to binaural
>>
>> I think the 8 impulses are used differently. I'm scared of trying to
>> explain something of which my own understanding is somewhat hazy, but
>> here it goes: please correct me ruthlessly. Even if in the end I wish
>> I'd never been born, there might be something to learn from the
>> resulting discussion :)
>>
>> W goes to loudspeaker LS1, LS2, ..., LSn.
>> Same for X, Y, and Z.
>>
>> Each LSn then goes both to left ear and right ear.
>>
>> So you start with a 4 to n matrix, feeding into an n to 2 matrix. The
>> component-to-speaker convolutions and the speaker-to-ear convolutions
>> (the HRTFs) are constant.
>>
>> Convolution and mixing are both linear, time-invariant operations. That
>> means they can be performed in any order and the result will be
>> identical. I guess in math terms they are transitive and associative, so
>> that (a # X) + (b # X) is the same as (a + b) # X, and a # b # c is the
>> same as a # (b # c), where "#" means convolution.
>>
>> So the convolution steps can be pre-computed as follows, where DEC(N,m)
>> is the decoding coefficient of component N to loudspeaker m, expressed
>> as convolution with a dirac pulse of the appropriate value:
>>
>> L = W # DEC(W,LS1) # HRTF(L,LS1) + ... + W # DEC(W,LSn) # HRTF(L,LSn)
>>   + X # DEC(X,LS1) # HRTF(L,LS1) + ... + X # DEC(X,LSn) # HRTF(L,LSn)
>>   + Y # ...
>>   + Z # ...
>>
>> (same for R)
>>
>> which can be expressed as
>>
>> L = W # ( (DEC(W,LS1) # HRTF(L,LS1) + ... + DEC(W,LSn) # HRTF(L,LSn) )
>>   + X # ...
>>   + Y # ...
>>   + Z # ...
>>
>> (same for R).
>>
>> Note that everything in brackets is now constant and can be folded into
>> a single convolution kernel.
>>
>> That means you can, for first order, reduce the problem to 8
>> convolutions, going from {WXYZ} to {LR} directly. The complexity is
>> constant no matter how many virtual loudspeakers you use.
>>
>> Of course, that does not take into account dual-band decoding. But if we
>> express the cross-over filters as another convolution and split the
>> decoding matrix into a hf and lf part, we can also throw both halves of
>> the decoder together and do everything in one go.
>>
>> For nth order, you have (n-1)² * 2 convolutions to handle.
>>
>> For head-tracking, the virtual loudspeakers would move with the head (so
>> that we don't have to swap HRTFs), and the Ambisonic signal would be
>> counter-rotated accordingly. Of course that gets the torso reflections
>> slightly wrong as it assumes the whole upper body moves, rather than
>> just the neck, but I guess it's a start.
>>
>>
>>
>> --
>> Jörn Nettingsmeier
>> Lortzingstr. 11, 45128 Essen, Tel. +49 177 7937487
>>
>> Meister für Veranstaltungstechnik (Bühne/Studio)
>> Tonmeister VDT
>>
>> http://stackingdwarves.net
>>
>> _______________________________________________
>> Sursound mailing list
>> Sursound@music.vt.edu
>> https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here,
>> edit account or options, view archives and so on.
>>
>>
>>
>
> _______________________________________________
> Sursound mailing list
> Sursound@music.vt.edu
> https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here,
> edit account or options, view archives and so on.
>



-- 

As of 1st October 2012, I have retired from the University.

These are my own views and may or may not be shared by the University

Dave Malham
Honorary Fellow, Department of Music
The University of York
York YO10 5DD
UK

'Ambisonics - Component Imaging for Audio'
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://mail.music.vt.edu/mailman/private/sursound/attachments/20160128/9dfbef3a/attachment.html>
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.

Re: [Sursound] Never do math in public, or my take on explaining B-format to binaural

Reply via email to