Re: [Sursound] vertical precendence and summing localisation (wallis and lee 2015)

Stefan Schreiber Mon, 14 Dec 2015 18:33:06 -0800

Joseph Anderson wrote:

Hi Peter,


You left out proximity in your panner. (Doh!) ;-)

One of our postgrads (Dan Peterson
<https://dxarts.washington.edu/people/daniel-peterson>) has been working on
a doppler-panner that includes diffusion filtering and the proximity
effect. (Of course, all built out of the SuperCollider version of the ATK
<http://doc.sccode.org/Guides/Intro-to-the-ATK.html>.)

The results are fairly convincing. The technique was used for Peterson's
new piece Steilacoom, premiered on the recent UW DXARTS Concert
<https://dxarts.washington.edu/events/2015-11-20/dxarts-fall-concert>.

The plan is to make this code available as part of the ATK
<http://doc.sccode.org/Guides/Intro-to-the-ATK.html> documentation.




*Joseph Anderson*

Just to set the record straight: There is currently a lot more practicalactivity happening in related areas than a lot of people might be awareof. (No offense intended, of course.)


Relevant for the current/ongoing discussion:

I.

https://developer.oculus.com/documentation/audiosdk/latest/concepts/audio-intro-localization/

(Just a "intro", but not in a simplifying way...)

Excerpts:

Consider the following: human beings have only two ears, but are ableto locate sound sources within three dimensions. That shouldn't bepossible — if you were given a stereo recording and were asked todetermine if the sound came from above or below the microphones, youwould have no way to tell.

...

Front versus back localization is significantly more difficult thanlateral localization. We cannot rely on time differences, sinceinteraural time and/or level differences may be zero for a sound infront of or behind the listener.

...

Humans rely on spectral modifications of sounds caused by the head andbody to resolve this ambiguity. These spectral modifications arefilters and reflections of sound caused by the shape and size of thehead, neck, shoulders, torso, and especially, by the outer ears (orpinnae). Because sounds originating from different directions interactwith the geometry of our bodies differently, our brains use spectralmodification to infer the direction of origin.

HRTFs by themselves may not be enough to localize a sound precisely,so we often rely on head motion to assist with localization. Simplyturning our heads changes difficult front/back ambiguity problems intolateral localization problems that we are better equipped to solve.

(!)

In our discussion before we have found convincing evidence and argumentsthat head motion should be relevant even to obtain improved verticallocalization.



II. And here, some "old" or say classic article about game audio...

http://ixbtlabs.com/articles2/sound-technology/index.html

If sound sources are immovable, their positions can't be determinedprecisely, because the brain needs them moving (movement of the sourceor subconscious micro-movements in the listener's head), which helpsto determine a sound source position in the geometrical space.


(?!)

Modern systems of reproduction of positioned 3D sound utilize HRTFfunctions forming virtual sound sources, but these synthetic virtualsources are spot. In the real life the sound mostly comes from largesources or composite ones which can consist of several individualsound generators. Large and composite sound sources allow for morerealistic effects in comparison with spot sources.A spot source can be successfully applied to large but distantobjects, for example, a moving train. But in the real life when thetrain is approaching the listener it's no more a spot source.

(See

One of our postgrads (Dan Peterson
<https://dxarts.washington.edu/people/daniel-peterson>) has been working on
a doppler-panner that includes diffusion filtering and the proximity

effect.

)
...

The third group consists of the sound tone parameters. This can helpthe player define what the walls are made of, what is the air densityin the environment etc. Every material reflects and absorbs certainfrequencies. These parameters emulate such absorption and reflection.They are relative frequencies (LF - Low Frequency and HF - HighFrequency) within which changes can be made. For example, metallicwalls reflect more frequencies than wooden ones, and the HF level willbe lower for them than for emulation of wood. For example, theworkshop has the following parameters: 362Hz LF and 3762 Hz HF; awooden room has the LF at 99 Hz and the HF at 4900 Hz. Finally, thereare parameters controlling the effect of Room LF and HF frequencies(in dB). This subgroup also contains Decay factor for LF and HF, andAir Absorption HF factors.

It is a safe bet that specifically AR/VR will require a solidunderstanding of acoustics and human audio perception. They will have tofind improved ways to reproduce surround sound (including 3D audio) viaheadphones and loudspeakers.

(And of course, also to find convincing and robust ways to record 3Daudio. What the Ozo camera is supposed to do but maybe doesn't... Ibelieve that Nokia should reveal more solid and transparent technicalinformation if Ozo is supposed to be a flagship product in theprofessional camera area. A $60.000 device is not exactly an impulsebuy...Probably this will happen anyway.)


Best,

Stefan



*http://www.ambisonictoolkit.net/ <http://www.ambisonictoolkit.net/>*


On Fri, Dec 11, 2015 at 5:36 AM, Peter Lennox <p.len...@derby.ac.uk> wrote:

A reason for 'tinting' on the encode side might be this:

Sometimes, when synthesing a 3-d soundscape, I find it useful to pan
decorrelated reverberant material upwards - this gives a sense of
upward-spaciousness, and can help with clariy and intelligibility of
sources (kind of directional release from masking) - it's also handy when
using reverb to simulate distant sources.

Tinting on the decode side might interfere with that, whereas doing it on
the encode side gives the flexibility to choose.

There's another thing - dynamic panning in trajectories overhead -
'tinting' would allow a slight phasy effect to interact with a doppler
effect to emphasise that passing overhead sensation - and intuitively, it
makes sense to combine these parameters in a single plugin.

I quite like the idea of dedicated dynamic overhead panners...

But then I quite like the idea of "straight line panners"... which could
actually incorporate parameters for object velocity and distance, utilising
amplitude, EQ and dry/reverb, all in one plugin.

And I fancy the idea of 'flocking, chasing and scattering' panners

and 'self motion panners' - that can be loaded up with a bunch of sources
with nominal positions featuring varying distances, so that as the
first-person listener virtually moves through the virtual environment,
auditory parallax is preserved.

Oh, and whilst I'm writing to Santa, a 'biological motion' module, that
can impart the characteristic locomotion cues to a given source - complete
with optional gravity, friction, mass parameters.

And I want that by the 25th, please...


Dr. Peter Lennox
Senior Fellow of the Higher Education Academy
Senior Lecturer in Perception
College of Arts
University of Derby

Tel: 01332 593155
________________________________________
From: Sursound [sursound-boun...@music.vt.edu] On Behalf Of Augustine
Leudar [augustineleu...@gmail.com]
Sent: 11 December 2015 13:10
To: Surround Sound discussion group
Subject: Re: [Sursound] vertical precendence and summing localisation
(wallis and lee 2015)

Hi Bo,
I googled "tinting" in relation to this but couldn't find any papers -
could you point me in the direction of these demonstrations/links ? The
thing is virtually all the HRTF info related to vertical localisation is
above 4 khz. The device we made allowed you to move sounds up and down, and
horizontally with a wii controller - it worked quite well but we did cheat
a little by just making sounds more high pitched when they went upwards as
well as convolving them with directional bands but this wouldn't work with
all sounds.

On 11 December 2015 at 12:54, Bo-Erik Sandholm <bosses...@gmail.com>
wrote:

I just want to say that when I read Joseph's mail I feel like christmas

has

come early this year :-)

I have been thinking about headtracked binaural listening for a couple of
years and discussing it here and in other forums.

The goal is to make it possible to listen to ambisonics first order with
earphones with head tracking usen open source programs and procedures.


I do not think we should wait until it is possible to create a individual
HRTF for a everey day nontechnical person.

This is avaliable:
  Software and hardware to do it with software written by
 http://www.matthiaskronlachner.com/?p=2015 or the
Ambiexplorer on the phone with the same effort to build the head tracker
bu also t adding a blutooth transmitterand and using another firmware.


I have been thinking of taking another way to the goal.

What I have been thinking of is a tinted head tracking binaural decoder

(I

did not know the principle had a name)

My take on the decoder is that it
   - below ~ 4kHz it should use standard HRTF decoding and have a few
profiles selectable on the width of the head,
     ignoring individual ear shape effects above 4kHz.

  - tinting used to improve the height perception in binaural decoding,
tinting subsituting for HRTF above 4kHz for height.
     tinting has shown it is possible to add  height information to
Stereo, This has been demonstrated.

   - I want the shoulder reflections to be taken in to account, I belive
the varying impact of a comb filter effect of the shoulder reflection is
VERY important.
            - the software should be controlled by parameters for head
tilt related to shoulders and head versus body turning
           - maybe also the normal distance from the ears to the
shoulders, but I do not think this is very important as we adjust to
clothes on shoulders very easily.

I belive we should take inspiration from  UNIX principles when creating

the

software, that is to use a chain of software that each does one thing

well

and do not have to be rewritten all the time, A number of VST modules

that

can be chained for could be the solution.

We already have a number of the needed modules, the advantage of modules

is

that they can be replaced or switched between.

-  Ambisonic rotation and tilt controlled by OSC in VST - ambix_rotator

for

example
-  Binaural decoder,  Ambix_binaural, Tinted binaural decoder - not
available
-  Shoulder reflections - I believe the shoulders are in many cases left
out of the HRTF sets.
  If not we work with the difference intoduced when turning the head in
relations to the fixed shoulders.

Reaper DAW that I use is not free (shareware) but very low cost,

Head tracking module,
http://www.rcgroups.com/forums/showthread.php?t=1677559
 - I hope to get assistance to make two modifications to the firmware

and

hardware,
   the first is to change the output syntax of the data stream to OSC.
   The second is to add a second 9DOF sensor on a cable  and use this to
track shoulder movement.

Best Regards
Bo-Erik Sandholm
Stockholm

Amateur ambisonic recordist
Interest in sound reproduction since beginning of 1960's.
Ex Network Engineer and unix system manager
Not a Programmer now for 35 years :-)







2015-12-10 21:05 GMT+01:00 Joseph Anderson <
j.ander...@ambisonictoolkit.net>
:

I'd just add here that a sensible approach would be to use (or design)

'tinted' decoder. That is, a decoder that includes frequency (& or

time)

domain filtering to color the soundfield on playback.

Blue Ripple Sound <http://www.blueripplesound.com/> includes tinted
decoders
<http://www.blueripplesound.com/products/poa-decoding-vst> in their
technology portfolio. (Furse describes this in a patent
<http://www.google.com/patents/US20120014527>.) For the ATK
<http://www.ambisonictoolkit.net/wiki/tiki-index.php>, I've thought

about

including a help page in the SuperCollider documentation
<http://doc.sccode.org/Browse.html#Libraries%3EAmbisonic%20Toolkit> on

how

to go about implementing a tinted decoder, but haven't done so at this
time.

The basic idea of 'tinting' is very simple: process the reproduced
soundfield in a way that 'enhances' or further achieves some effect

you'd

like. To enhance elevation, we may choose to color the soundfield in a

way

that exaggerates this sense. We have two choices in the processing:

  1. process the soundfield before decoding
  2. process the soundfield after decoding

A combination of both gives the most flexible results, and the best

choice

really depends on what kind of decoding array you're working with. If

you

have a full 3D array, choice 2 makes sense. Whereas, with a 2D layout,
processing the soundfield before decoding (option 1) is probably the

best

idea.

Option 1 is implemented like this:

  - decode soundfield to array of equally distributed 'virtual
  loudspeakers'
  - filter 'virtual loudspeakers', depending on direction
  - re-encode soundfield

Option 2 is this:

  - decode soundfield to array of real loudspeakers
  - filter these, depending on direction

Choosing the correct filtering to enhance elevation is the tricky part.
You'll want these to be phase matched. (Linear FIR, is an easy choice.
Phase matched 2nd-order IIR shelfs also work well.) There are many

papers

about modeling HRTFs, a simple choice is to just review the suggested
filtering for simple spherical head modeling. A very quick search turns

up

a paper from Duda and Brown
<http://www.ece.ucdavis.edu/cipic/files/2015/04/cipic_Brown_Duda98.pdf

With listening in an Ambisonic soundfield, you need to remember that

the

listener's head already applies the listener's own HRTF. The trick will

be

to enhance without unduly distorting.

Something also useful to note: if you're a creative artist, you can

'tint'

the soundfield for creative purposes. A simple example is what might be
what we call 'soundfield highlight'. The idea here is that we'd

low-pass

all of the soundfield, except our 'highlight'. And notably, we can

steer

where the 'highlight' is located. (E.g., highlight different parts of

the

soundfield.) We can think of this as 'directional masking', but with a
frequency dependence. I won't go into the exact details of

implementing a

signal flow to generate this effect, but the ATK
<http://www.ambisonictoolkit.net/wiki/tiki-index.php> includes all the
parts needed to do so.


My kind regards,


*Joseph Anderson*



*http://www.ambisonictoolkit.net/ <http://www.ambisonictoolkit.net/>*


On Thu, Dec 10, 2015 at 9:29 AM, Jörn Nettingsmeier <
netti...@stackingdwarves.net> wrote:

On 12/10/2015 04:59 PM, Peter Lennox wrote:

It does imply that an ambisonic panner plugin that incorporates

spectral

manipulation would be more efficacious

noooooo!

if it's an ambisonic panner, it doesn't change the spectrum. if it

changes

the spectrum, it's not an ambisonic panner :)



--
Jörn Nettingsmeier
Lortzingstr. 11, 45128 Essen, Tel. +49 177 7937487

Meister für Veranstaltungstechnik (Bühne/Studio)
Tonmeister VDT

http://stackingdwarves.net

_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe

here,

edit account or options, view archives and so on.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <


_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.

Re: [Sursound] vertical precendence and summing localisation (wallis and lee 2015)

Reply via email to