Dear colleagues,
the following posting is a standard proposal, as such quite long.
(Obviously anybody can stop reading if not interested.)
In any case, an Ambisonics based proposal had to be presented on this
audio list.
--------------------------------------------------------------------------------------------------
I believe that Ambisonics has come to a state which requires the
introduction of some agreed standard for relatively easy and
user-centric implementation, including certain wider recommendations for
a successful application at home, in mobile devices etc.
The proposed open and free-to-use standard will include both 1st order
Ambisonics and HOA. The standard should be extensible, in two senses: I
will present "minimalistic" but nevertheless powerful mandatory part
(currently restricted to 8 channels), some optional part (offering
currently two variants with more than 8 channels), and we will talk
about possible future extensions.
The standard proposal - different than B format or say AmbiX - doesn't
aim to present a complete hierarchy, for reasons of simplicity and
easier implementation.
Other intentions:
The presented proposal is also designed to be usable as a CE standard,
for home use, to be used on mobile devices, in areas like gaming, VR
(think of "Oculus Rift") etc. Whereas the use of the term "CE" has been
criticized here before, there are practical implications if you think
what typical CE applications are.
In the case of Ambisonics, you will always need certain decoding
software, but also certain audio hardware like sound cards, receivers,
loudspeakers, headphones... This is important, because some people
working in this field seem to underestimate the problems related with
the availability of audio hardware and equipment at home, at least in my
opinion.
The term "CE" applies also to certain applications. Many to most current
TVs include some function or < app > for YouTube or Netflix, for
example. If you define any new audio format for streaming (and if based
on the suggested standard proposal), you would certainly have to use
compressed audio for this aim, currently very probably AAC - or maybe
the DD+ or the Opus codec. Therefore the standard is presented in a way
which can be implemented in a flexible form and in any format, just like
B format itself.
If talking about lossless codecs, the normal "CE" and "music enthusiast"
codec seems to be FLAC, not WavPack. (This is an important remark, not
an intentional "kick" to insult other standard proponents. We will talk
later about the elements of this standard proposal which are
backward-compatible to stereo. If this matters, use FLAC, because FLAC
is the codec which people already use.)
B format: I see B format as an Ambisonics channel ordering and
normalization scheme, up to 3rd order. However, even B format is not a
"complete" description of Ambisonics up to 3rd order, in the sense that
a relatively new mixed order scheme - the xhyv hierarchy - has been
proposed independently by different people. This hierarchy can use the B
format components up to 3rd order, but is not described by B format itself.
Short resumen:
The proposed standard is a description of mandatory and optional
elements, which can be implemented in a flexible way.
Because the standard makes heavy use of both FOA and HOA B format (but
is not identical to B format in the optional part, and even less in
future extensions), I propose the name "sound field format" (.SFF) as
preliminary standard name. (IF this standard will go on in any form, you
need a name.)
I. FOA, soundfield mikes
This is the basic and historically first form of Ambisonics. For legacy
reasons, because of the many existing recordings and because of the
continuing relevance of different soundfield microphones, FOA has to be
included into any Ambisonics based standard for surround sound/3D audio.
The original B format (1st order, so WXYZ) has been superseded by the
FuMa B format scheme up to 3rd order, and the related .amb file format.
The .amb format is widely used in available Ambisonics decoder, plugins,
software tools etc. In this sense, B format/.amb is the de facto
standards for Ambisonics up to 3rd order.
Source:
http://dream.cs.bath.ac.uk/researchdev/wave-ex/bformat.html
There is currently not any accepted lossless or lossy compressed file
format associated with B format, at least as far as I know. (It has been
shown that AAC is a good codec to compress Ambisonics channels, and
could be applied in different/decreasing bitrates up to very high orders
- if bitrate matters.)
Recent parametric or non-conventional decoders like Harpex and DirAC try
to improve the perceived resolution of FOA out of the sweet spot.
(
Further reading:
http://harpex.net/
)
Note that Harpex also provides upsampling of 1st order to 3rd order
(both B format). As far as I understood, this applies at least to the
horizontal case. (Commentaries/feedback very welcome)
II. HOA
Both B format and .AMB are only defined up to 3rd order.
Here the link to some background paper about the recent and current HOA
standardization efforts, written from the perspective of the people
behind some standard proposal beyond 3rd order Ambisonics (AmbiX):
http://iem.kug.ac.at/fileadmin/media/iem/projects/2011/ambisonics11_nachbar_zotter_sontacchi_deleflie.pdf
Because you have to start from some point, I already have proposed to
combine "just" 1st order and 3rd order Ambisonics into one simple yet
powerful standard for home and mobile use, recording etc.
Therefore, the mandatory part of SFF (v. 0.1) is a subset of B format.
(The actual standard proposal will also include certain recommendations
for decoding, backward-compatibility to stereo, mandatory and optional
loudspeaker configurations etc.)
Because I have proposed this before, I will cite the respective former
posting (9th of September 2013):
Dear colleagues...
To continue the proposal to use < certain > forms of .AMB as a
real-world format for the transport/storage of 3D audio (including
music recordings), I would like to hint to some further and important
issues involved.
A full .AMB decoder would have to be able to decode the nine different
combinations of .AMB to different (standard?) loudspeaker
configurations, and also to headphones. (The latter would be some
important point in my "requirement list".) This means there will be
plenty of combinations, and some great opportunities to mess things up
if anybody wants to implement the 9*9 "or so" combinations ... :-)
It would be advantageous if we would be able to limit .AMB to some <
CE profile > with far fewer combinations!
(To cover just FOA won't be enough. We know that FOA has certain
limitations and won't be good enough for all applications. Think just
of the sweet spot issues.)
My impression is that you would have to use < at least > 3rd order to
overcome many/most of the typical FOA problems.
Some advantages of TOA, compared to FOA:
- much larger sweet spot (not only support for individual listeners;
IMO this is very important, as I would like to be able to demonstrate
some wonderful recordings to "at least one" friend, even better to
"some friends". If you don't have friends, don't bother... :-D )
- angular resolution significantly improved, compared to FOA
(improvement of more than factor 2)
- improved performance at higher frequencies
- we know that FOA has certain problems to present sound "from the
sides", even if the playback rig would include loudspeakers at direct
lateral positions.
http://www.acoustics.hut.fi/~ville/papers/pulkkiicmc2001.pdf
- Decoding TOA to (ITU) 5.1/7.1 will show much better results than FOA
to 5.1/7.1. (Comments? I know that 5.1 is an underspecified
irregular array from an Ambisonics TOA perspective, but you can decode
this and the results will be better than in the FOA to 5.1 case...)
- Improved behaviour at higher frequenciess
Altogether, a practical "CE" format based on Ambisonics and .AMB could
be introduced in the following, simplified form
I) FOA/ UHJ (3-4 UHJ channels, the proposed backward-compatible form
to stereo of FOA...)
3/4 channels
(Classical decoders and other decoders, supposed to improve on
classical ones...)
II) 3rd order horizontal-only and 3h/1p, which you could combine to
just 3h1p (1st order vertical)
7/8 channels, or 8 channels
(Might still be offered in some "UHJ", stereo-compatible fashion;
"UHJ" for 2nd/3rd oder doesn't exist yet, but it can be done.)
III) 3h/3p, 16 channels (call this the .AMB master format? Anyway,
this is the upper end of Fu-M AMB...)
In the 1st version SFF (v0.1 - 1.0), I would restrict the < mandatory >
channel number count to 8. (8 channels is still an important limit, to
comply with maximum channel numbers in formats, CE interfaces, DAWaudio
buses and workflows, etc.)
This means that you would combine I and II, referring to the cited
passage above. (III cancelled. III might be the mastering format used
in a DAW, but it is not part of SFF itself.)
I have seen later that I and II have also been used in a (well-received)
audio engine for gaming:
http://etiennedeleflie.net/2008/06/24/codemasters-ups-their-useage-of-ambisonics-on-race-driver-grid/
I though you might like to know that our latest game RaceDriver GRID
deploys hybrid third- order Ambisonics (again) on PlayStation 3, plus
a lot more concurrent sound sources (at times hundreds play at once,
each generating eight WXYZPQUV Ambisonic components)
(The above combination is 3h1p, in FuMA terms.)
This audio machine is related to Blue Ripple Sound and Richard Furse,
who recently has introduced TOA (VST) plugins.
So, the < mandatory > subset of .amb which will be included into the
SFF proposal (v0.1) will be FOA (3/4 channels) and mixed-order 3h1p TOA
(7/8 channels), and nothing else.
Note that in both the 1st order (3-4 channel) and 3rd order (7-8
channel) case you can use the < same > decoder, which means the same
decoder for 1st and 3rd order horizontal and 3D audio decoding. (If I
understood this well enough, you can't decode 3h1p via a
full-periphonic 16-channel TOA decoder leaving unused channels at "0",
because you would obtain certain errors. This means that different
mixed-order variants need a different decoder. The above case is
specific, because the B format z component can be ignored if you decode
to any horizontal configuration. In other words: You can ignore z if
elevated speakers - positive and negative elevation possible - don't
exist in the output configuration. The common horizontal/3D decoder has
to include z in each case, of course. )
I propose that that the following mixed order variants could already be
included into the v0.1 version of SFF, as < optional > elements:
- 2h1v (8 channels),
- 3h1v (12 channels), and/or
- 3h2p (11 channels)
(Optional means that a decoder or plugin could be able to reproduce
these variants, but doesn't have to.
A decoder or plugin < should not > crash if reading more than 8
channels, though... This is a superfluous definition, but has to be
respected in any case. So, don't sue me for any standard implementation
which doesn't work, destroys your computer file system, or whatever
other minor reason... ;-) )
At this point (optional part) we have left the backward-compatibilty to
B format/.amb, including some combinations of the xhyv hierarchy. (B
format uses xhyp for mixed orders)
xhyv is as 3D audio format more stable than xhyp, in the sense that the
horizontal resolution stays the same, independently of the
elevation/height levels where the sound comes from. (As long as spatial
music happens mostly in the horizontal plane, I would say that most to
all current music recordings and most spatial performance projects could
be "done" in xhyp. IMO 3h1p is still and clearly a better recording
format than 2h1v, because of the higher horizontal resolution. If you
have content which makes heavy use of the full-sphere, you would have to
cope. Then and only then, 2h1v might be perceived to be a better option
to present isotropic 3D audio. In this case, the reign of mighty WFS
would also come to an end, in epic failure style... )
An introduction to different mixed-order schemes can be found here:
Ch. Travis "A New Mixed-Order Scheme for Ambisonic Signals"
http://ambisonics.iem.at/symposium2009/proceedings/ambisym09-travis-newmixedorder.pdf/?searchterm=travis
(Not restricted to 3rd order)
The mandatory part of SFF is a subset of .amb < and > respetcs an upper
8-channel limit. (6 channels is nowadays not a serious limit, because
you won't distribute B format or "SFF v1.0" via DD, DTS or S/PDIF. Think
also of the fact that AAC, Opus etc. offer much better compression than
DD and DTS. I don't see < any > bitrate problem for 8 channels AAC/Opus
compressed audio if 5.1 DD didn't have a bitrate problems in the 90s.)
It is good to include some optional format extensions with a higher
channel count right from the start, as suggested. At worst you would
prepare future extensions, but you didn't "break" current software and
interfaces in the mandatory part. (I am aware that optional features are
not always implemented, because people tend to be pragmatic, vulgo
"lazy". On the other hand end users/professionals/musicians will be able
to chose between different implementations. Choice means that optional
features will be implemented at least sometimes IF a certain demand
exists in the 1st place.)
III. Loudspeaker configurations (presentation as a sketch)
1. Mandatory support for:
Stereo, quad (4 speakers), 5.1, 7.1
Binaural for a mobile device. (Obviously. What else?)
2. Optional
I propose to put both the regular hexagon and octagon into this
category. (Very nice for enthusiasts and for people working in this
field, but statistically not yet found in the wild. Can you really
expect that consumers will move around their 5.1/7.1 speakers?)
Binaural via loudspeakers, using X-talk cancellation. (Ambiophonics,
BACCH etc.)
(You could say that the people behind might adapt their technology to
reproduce SFF.)
3. "New" horizontal configurations:
a) ITU 6.1
This is 5.1 + a CB (center back) speaker. This configuration didn't
succeed for home theater use. One reason could be that the angle between
SL/SR and CB is 80º according standard, so maybe a bit too high for
effective pairwise panning.
In the Ambisonics case, the cited panning problem doesn't exist, and
there should be some real improvement compared to the normal 5.1
configuration. 6.1 is then an easy extension of 5.1, and very probably
better for soundfield generation. (The existing home theater
installation is both extended and stays < unchanged >. No side effects,
you only have to install one more speaker.)
Because the center back speaker might have to be closer to the listener
than other speakers for practical reasons ("not enough space behind your
sofa"), some distance compensation/delay might be necessary.
This same plane-wave assumption makes it possible to vary the distance
of speakers within reasonable limits without upsetting the correct
function of the decoder, provided that the difference is compensated
with delay, the power is adjusted for uniform loudness at the center,
and that per-speaker near-field compensation is used. Distance does
not affect the decoder matrix.
(Source:
http://en.wikipedia.org/wiki/Ambisonic_reproduction_systems
)
b) "ITU 8.1"
Actually, this configuration is not a defined standard, and actually
might not exist anywhere yet. It is ITU 7.1 plus a CB (center back)
speaker. While this doesn't improve things for panned audio (unless you
want to have sharp "non-phantom signals from a back position"), this
might be a good configuration for TOA decoders. You have 8 speakers, and
also a symmetric configuration to some thought axis in the middle, going
from -90º to 90º. (This configuration is relatively close to an "ideal"
octagon, and backward compatible to all versions of 5.1/6.1/7.1 film audio.)
The ITU 6.1 and "8.1" configurations would add just one loudspeaker to
existing configurations for film sound.
3. Loudspeaker extensions for 3D audio.
You would add (at least) 4 upper speakers, as long as SFF stays
restricted to 1st and 2nd vertical orders. 2nd order would require at
least one elevation level more. (Full periphonic 3rd order would require
an 8-6-1, for the upper hemisphere. Full 3h1p decoding requires 8-4 or
8-4-1. 8-4-1 fits even for the optional 3h2p case, BTW. "8" can refer to
ocagon or "ITU 8.1", see above. Decoding of TOA to 5.1/6.1 is possible,
even if detail is lost.)
Negative elevations are possible and have been done, in certain home
loudspeaker configurations. (Including the 7.13D configuration, if we
talk once more about gaming. )
IV. Binaural decoding
Decoding sound field format/SFF via HRTF sets via headphones, optionally
including HT. (Note that motion-sensors, gyroscopes and position
tracking via GPS are included in smartphones and many other devices. See
also recent discussions on sursound.)
V. Working with SFF, mixing, cutting, editing...
All current mandatory and optional combinations of SFF v0.1 (4-12
channels) could be derived from a full-periphonic 3rd order B format mix
(16 channels), presented probably in .AMB format. This is why we
currentlay stay at 3rd order/TOA, and don't aim for orders above.
(When going from 1st order Ambisonics to TOA, there is a significant
improvement of quality. The improvements are less clear if you compare
SOA to FOA. If SFF or any accepted TOA format could be established as a
kind of 5.1 successor, any potential version after might very well skip
4th order and go directly to anywhere between 5th to 7th order, to
obtain some clear improvement. And probably we would use again some
mixed-order scheme, maybe 5h3p or 6h3p. 20 or 22 channels sound like an
awful lot compared to 5.1, but it is the factor 4. If you say that some
Ambisonics channels is compressed with AAC and say 80 kbit/s each
channel, you would (still and only) need about 1.6 MBit/s for 20 audio
channels. (I also don't see any problems to handle about 20 uncompressed
channels in a DAWs. Just think of the requirements for video editing
and even Photoshop, and relax...)
8 channels can be coded easily in 640kbit/s. This is a Dolby Digital
data rate. (The above data rate of 1.6 MBit/s would compare to 5.1 DTS,
but is wayyyy better... There exist certain ideas how you could compress
very high orders even more, but this is beyond the scope of the current
SFF project.)
A far bigger problem for any application of Ambisonics >= 4th order at
home than memory space or available bitrates is the fact that you will
have to use (or re-use) existing loudspeaker configurations, too. (At
least if you don't expect that your customer will run to Wal Mart or
Lidl to buy plenty of cheap speakers and cables for the installation of
his/her newest Ambisonics home system.)
TOA has already been shown to work successfully in home applications,
see the cited gaming audio engine in section II.
SFF is meant to be used for specific areas, especially as a home and
mobile surround/3D audio format. We are not covering yet all possible
applications/needs such as live concerts, audio installations, cinemas,
theaters, operas etc. However, it would be possible to present some
future version of SFF which will cover all those applications and areas,
and might include HOA order >=4.
(So, it made sense to talk about extensions to 5-7h/3p, or similar
proposals, probably mixed-order variants. Any future version of SFF
would be backward-compatible to "SFF v1.0", of course. Otherwise you
could not speak about "SFF" at all)
VI. Microphones
The FOA soundfield microphones got some recent boost via the Harpex
decoders and upsamplers, which in some cases will improve results if you
compare the Harpex decoder to classical dual-band, rE or in-phase
decoders. Harpex is also very useful (I hear) if you want to mix a FOA
recording into some (synthetic) HOA field. (I believe this works only
horizontally, for now.)
A spherical array microphone with 32 capsules can record 3rd and maybe
4th order. (If I understood this well, you could introduce a version
with 20 capsules as a simpler TOA mike. Comments?)
An excellent article about both HOA and HOA microphones you'll find here:
http://ambisonics.iem.at/symposium2009/proceedings/ambisym09-daniel-evolvingviewsonhoa.pdf/at_download/file
It seems to me that direct 3rd order recording is currently about
feasible. (EigenMike)
VII. Backward-compatibility to stereo
I have suggested before that FOA (1st order B format) could be
distributed in a form which is backward-compatible to stereo, using the
3/4 channel form of UHJ.
http://en.wikipedia.org/wiki/Ambisonic_UHJ_format#The_UHJ_hierarchy
If a third channel (T) is available, this can be used to give improved
localisation accuracy to the planar surround effect when decoded via a
3-channel UHJ decoder.
Adding a fourth channel (Q) to the UHJ system allows the encoding of
full surround sound with height, known as Periphony, with a level of
accuracy identical to 4-channel B-Format.
Which gives just another form of WXYZ, coded in a backward-compatible
fashion to stereo, which is LRTQ.
The LRTQ presentation could very probably be used to distribute AAC
stereo/FOA files. If lossless distribution is required, you might
distribute LRTQ via FLAC. (Note that you need always a decoder for the
surround signals. LRTQ will always be brought back into B format form,
if you decode to surround sound. Nobody would expect to need an
Ambisonics/UHJ decoder for POSL/plain old stereo listening, unless you
are a real expert and decode B format to 2 speakers. It works and you
do this maybe after recording with a SFM, but... O:-) )
This idea could be extended to higher orders, how I have suggested
before. A very easy way to do so has been proposed by Jörn Nettingsmeier.
Take Ambsionics B format orders 3h and (mixed-order) 3h1p. The
components are WXY(Z) UV PQ.
Jörn's suggestion is to use only 1st order to derive L/R, which leads us
to LRTQ UV PQ.
In the meantime, I have found some possible problem with this: The Q and
Z channels are actually different!
(The Q channel is not used to derive L/R. A possible explanation could
be in different forms of filtering. See the section presenting UHJ
coding and decoding equations in the cited link:
Q = 0.9772*Z
Z = 1.023*Q
Note that two-channel UHJ requires the player to use different shelf
filters than for 3/4-channel UHJ and B-Format);
So, according to Wikipedia, there is actually no different filter if you
compare 3/4 channel B format and UHJ format.
Why are Q and Z then different at all?! Note also that the decoding
equations for WXY are exactly identical for 3- and 4-channel UHJ.
(Comments/help very welcome)
Because Q and Z are not identical, there might be some problem with our
"naive" proposal. (This problem looks very fixable, but where are ye UHJ
experts who show us how to extend UHJ to infinite orders, using Jörn's
"don't care for higher orders if deriving L/R" (TM) approach? ;-) )
The hierarchical stereo-surround UHJ presentation of SFF saves not only
memory space or transmission bitrate (you don't have to distribute a
stereo file and a surround file), it also might save money in case that
certain right holders would ask for double license fees. Having worked
in the field of optical discs and having proposed (different) hybrid
discs, I would suspect this would be the normal case, unless recently
things have changed a lot in the music label and publishing rights world.
One file instead of two is also a desirable simplification, from a
consumer perspective.
VIII. So what?
What is suggested is a relatively simple standard for (mainly) 1st and
3rd order Ambisonics.
SOA (2nd order Ambisonics) is skipped (with the exception of 2h1v as
option), specifically 2h and 2h1p. (Going up to 3rd order provides some
clear improvements compared to FOA, such as the perfect or say
near-perfect reconstruction of a much wider frequency range in the
"head zone", a wider sweet spot/zone and improved localization for all
frequencies out of the sweet spot.)
Considering the wide use of B format in existing software, plugins,
decoders and software toolboxes/libraries, TOA is clearly preferable to
4th order, from the view of a software engineer/developper.
The SFF standard proposal is built around existing solutions wherever
possible, including some new elements. (Backward-compatibilty to stereo,
6.1 and "8.1" loudspeaker configurations, xhyv hierarchy...All these
elements are < optional >, which is intentional. Concerning the use of
the UHJ hierarchy to obtain easy backward-compatibility to stereo, I
suggest that these transformations can be done in an easy way, so I
strongly recommend to actually implement/use this feature. If this would
not be some own recommendation, I would have said this should be a <
mandatory > part. But so... 8-) )
Although the channel count of 8 is from a current perspective moderate,
3rd order Ambisonics is powerful. Maybe VBAP rendering via 8 speakers
would give the same accuracy of localization compared to (3rd order)
Ambisonics rendering, but I doubt that Ambisonics would test worse.
Beside of this, Ambisonics is loudspeaker independent, and we have some
real and flexible surround/3DA format. (Whereas VBAP is a rendering
technology, not a format.)
Two optional forms of SFF (3h2p and 3h1v) use 11 and 12 channels,
respectively. These are thought to improve applications which would
stress the 3D audio aspect. 12 channels is about the same as the channel
count for Auro-3D, which in it's most normal form is 11.1. (Auro-3D is a
simple discrete format for 3D audio, so the comparison of SFF to both
7.1 and 11.1 seems fair. You/we can test later...)
This has been a long posting. However, it is about some standard
proposal, and in request for comments form. Unlike like in any final
standard description, I also wanted to provide some < reasons > why I
propose certain features/elements, and some real discussion base. 5.1
won't go away, but the proposal (sound field format/SFF) is clearly
both more powerful and flexible. It is also more complicated, we have to
admit. (It is an open standard, in this sense different than Mpeg-H 3DA.
Without being able to give any proof because you can't prepare
unfinished states, I believe that SFF could/will be a lot simpler than
Mpeg-H 3DA.)
Special thanks go to Michael Chapman, Aaron Heller, Jörn Nettingsmeier,
Richard Furse and Svein Berge. You have answered both rather competent
and exceptionally ignorant questions, which for my feeling and
considering my limited capabilities still made a lot of sense O:-) ...
(The SFF proposal and all expressed views are just my own, although
every of the cited persons had some influence on my views. I am
completely unaware if any of the cited persons agrees with any of my
views... )
Speaking about Harpex, you could see this as a kind of bridge between
1st and 3rd order Ambisonics. (Harpex might actually be judged to be
even "sharper" than 3rd order under some circumstances, which will
depend on the decoded material. In general, "native" 3rd order should
prove to be more natural. Feedback welcome...)
We have also discussed if parametric decoders (like Harpex or DirAC)
could be extended to HOA and specifically 2nd/3rd order, which
theoretically should be possible. However, this doesn't seem to have
been done yet.
(Comments...)
An open standard doesn't mean that commercial solutions which would
implement/use SFF should not be welcome.
The proposed ("home", "CE", whatever...) standard SFF = Sound Field
Format should be controlled by the community (and/or possibly by certain
open organisations/foundations), not by companies.
Some attention has been given to the application and decoding side,
including < mandatory > and < optional > loudspeaker configurations and
binaural decoding options. Whereas some decoders (such as Rapture3D)
would offer free speaker positioning - which is impressing -, this is
not the normal case, nor < should > all decoders implement such a
feature. It is obvious that any "flexible" decoder could pre-install all
mandatory and optional configurations in an easy and selectionable way.
Best regards,
Stefan Schreiber Lisbon
---------------------------------------------------------------------------------------------
Further related thread:
-------- Original Message --------
Subject: [Sursound] Two new approaches for the distribution of surround
sound/3D audio
Date: Mon, 29 Jul 2013 03:57:34 +0100
From: Stefan Schreiber <st...@mail.telepac.pt>
Reply-To: Surround Sound discussion group <sursound@music.vt.edu>
To: Surround Sound discussion group <sursound@music.vt.edu>
(Continuation of: The commercial future of Ambisonics, 15/5/2013)
Dear colleagues,
following the recent standardization of 3D audio by Mpeg (ISO/IEC
23008-3) and related activities, I have come to the conclusion that the
(older) B format up to 3rd order might need some updates.
However, I also came to the conclusion that FOA (first order
Ambisonics) could be easily included into all current distribution
models for audio in the Internet, which are (to "99.98%") stereo-based.
We nearly have been "there", in the above cited thread! ("The
commercial future of Ambisonics")
I will start with this part, because you can see this as an own format.
Which might be the perfect bridge or transition format for future
surround/3D audio (3DA) formats...
....
IV. Improved binaural representation via headphones
Note that headphones with HT "chips" and motion-corrected binaural
playback of surround sound (including 3D audio) could easily be
realized, with available and actually quite affordable chips.
Oculus Rift is the direct example for this, as this is a full (and
certainly more complex) VR and gaming device.
http://worthplaying.com/event/E3_2013/PostE3_2013/89888/
In its current state, the Oculus Rift is an amazing piece of work, and
after decades of dealing with VR technology, it seems that we may
finally see a VR unit that is going to get it right.
Wikipedia writes about the Oculus Rift motion-tracking:
"Initial prototypes used a Hillcrest 3DoF head tracker that is normally
120 Hz, with a special firmware that John Carmack requested which makes
it run at 250 Hz, tracker latency being vital due to the dependency of
virtual reality's realism on response time. The latest version includes
Oculus' new 1000 Hz Adjacent Reality Tracker that will allow for much
lower latency tracking than almost any other tracker. It uses a
combination of 3-axis gyros, accelerometers, and magnetometers, which
make it capable of absolute (relative to earth) head orientation
tracking without drift.[20][25]"
Now, apply the same or similar HT silicon (which is already very
affordable) to HT/motion-tracking headphones... (I could give some
detailled recommendations how to do this, but this is also one of the
next steps... Nice to see that at least the video and gaming people have
kept some sense for cool technology and seemingly "weird" ideas, so to
speak. How many motion updates per second would a "fluent" head-tracking
binaural decoder/decoding program actually requiere? Ye Ambisonics
experts, what do you think or better know?!
You would have to decode some UHJ/".AMB+" file and shift the soundfield
relative to the head position, I guess. The head position needs some
regular and frequent updates, what we easily get b y now. You could also
track the < absolute > movements of persons within some area. Say: Your
decoder program tracks the movements of the visitors in some museum or
building, and plays the associated audio/explanations fitting to the
current position. "This is the dining hall of the castle, which was
quite cold during winter, but warm or even hot during summer." Ok, this
was a truly dull example... :-) )
Best regards
Stefan Schreiber Lisbon
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<https://mail.music.vt.edu/mailman/private/sursound/attachments/20140108/aaca7542/attachment.html>
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound