Re: [Sursound] Never do math in public, or my take on explaining B-format to binaural

Fons Adriaensen Tue, 26 Jan 2016 14:14:36 -0800

On Tue, Jan 26, 2016 at 09:52:21PM +0100, Jörn Nettingsmeier wrote:

> On 01/26/2016 06:36 PM, Stefan Schreiber wrote:
> 
> >2. < 8 > impulses (for 4 virtual speakers) implies that you don't
> >support 3D decoders (?). If not, why this? (Immersive/3D audio is on the
> >requirement list for VR. It wouldn't make a lot of sense if all sound
> >sources will follow your gaze - looking upwards or downwards.)
> 
> I think the 8 impulses are used differently. I'm scared of trying to
> explain something of which my own understanding is somewhat hazy,
> but here it goes: please correct me ruthlessly. Even if in the end I
> wish I'd never been born, there might be something to learn from the
> resulting discussion :)
> 
> ...


Your understanding is 100% correct.

I've been reading this thread with much interest, as it is exactly
about the topic I've been working on for the last two months.

The result is a prototype system converting full 2nd order to
binaural. Decoder and binaural rendering are combined, in the
way Joern explained, into a 9 * 2 convolution matrix.
Motion tracking uses a cheap (50 Euro) USB sensor which provides
around 90 quaternions per second, and the corresponding rotation
is done in the Ambisonic domain. The whole thing works quite
well so far. Unfortunately I can't tell much more, just
a few comments on some of the topics raised in the thread.

* Sofa is just a format for representing data such as HRIRs.
Apart from the actual IRs, a sofa file will provide things
like the set of directions, source distance etc. But it does
not impose any standard values for any of that metadata.
Converting a sofa data set into the N * 2 convolutions that
are required in the end can't be done blindly. I've been
using at least five different sources. All of them were in
sofa format, but each one required some quite specific
treatment. In other words, this is a format for researchers,
not for end users. 

* Most HRIR sets have an LF response that is almost certainly
not correct. Up to a few hundred Hz it should be flat. One
essential step in the preparation is to fix this. How this
is done best depends on the particular data set. If this is
done correctly you can trim the IRs to a few ms without any
adverse effect. 

* Another essential step is to align the delays. The HRIRs
must be shifed in time so that all the ipsolateral sides
have the same delay. This is to avoid comb filtering, which
would provide false spectral cues.

* From my personal experience I'd agree with Dave Malham:
you will adapt to a specific set of HRIRs if they are not
your personal ones. And it's my impression that this
adaptation remains - you'll 'remember' it.

* I don't think that the exact inter-ear distance is that
important - I've been able to modify this (within reason)
without any ill effect. What seems to make a set of HRIRs
personal is probably more the complex direction-dependent
filtering by the pinnae.

* All content derived from non-surround sources (e.g.
plain stereo or 5.1) requires some 'room sound' to work
well. Externalisation seems to depend on having early
reflections from different directions (which would allow
the brain to compare their spectra). Generating such
room sound can be done in the AMB domain. What exactly
is required and how to do that efficiently is my current
research problem.
 
Ciao,

-- 
FA

A world of exhaustive, reliable metadata would be an utopia.
It's also a pipe-dream, founded on self-delusion, nerd hubris
and hysterically inflated market opportunities. (Cory Doctorow)

_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.

Re: [Sursound] Never do math in public, or my take on explaining B-format to binaural

Reply via email to