[Sursound] Proposal for a (relatively) simple FOA/HOA standard. Request for comments!

Stefan Schreiber Wed, 08 Jan 2014 11:40:13 -0800

Dear colleagues,

the following posting is a standard proposal, as such quite long.(Obviously anybody can stop reading if not interested.)In any case, an Ambisonics based proposal had to be presented on thisaudio list.


--------------------------------------------------------------------------------------------------

I believe that Ambisonics has come to a state which requires theintroduction of some agreed standard for relatively easy anduser-centric implementation, including certain wider recommendations fora successful application at home, in mobile devices etc.

The proposed open and free-to-use standard will include both 1st orderAmbisonics and HOA. The standard should be extensible, in two senses: Iwill present "minimalistic" but nevertheless powerful mandatory part(currently restricted to 8 channels), some optional part (offeringcurrently two variants with more than 8 channels), and we will talkabout possible future extensions.

The standard proposal - different than B format or say AmbiX - doesn'taim to present a complete hierarchy, for reasons of simplicity andeasier implementation.


Other intentions:

The presented proposal is also designed to be usable as a CE standard,for home use, to be used on mobile devices, in areas like gaming, VR(think of "Oculus Rift") etc. Whereas the use of the term "CE" has beencriticized here before, there are practical implications if you thinkwhat typical CE applications are.In the case of Ambisonics, you will always need certain decodingsoftware, but also certain audio hardware like sound cards, receivers,loudspeakers, headphones... This is important, because some peopleworking in this field seem to underestimate the problems related withthe availability of audio hardware and equipment at home, at least in myopinion.The term "CE" applies also to certain applications. Many to most currentTVs include some function or < app > for YouTube or Netflix, forexample. If you define any new audio format for streaming (and if basedon the suggested standard proposal), you would certainly have to usecompressed audio for this aim, currently very probably AAC - or maybethe DD+ or the Opus codec. Therefore the standard is presented in a waywhich can be implemented in a flexible form and in any format, just likeB format itself.If talking about lossless codecs, the normal "CE" and "music enthusiast"codec seems to be FLAC, not WavPack. (This is an important remark, notan intentional "kick" to insult other standard proponents. We will talklater about the elements of this standard proposal which arebackward-compatible to stereo. If this matters, use FLAC, because FLACis the codec which people already use.)

B format: I see B format as an Ambisonics channel ordering andnormalization scheme, up to 3rd order. However, even B format is not a"complete" description of Ambisonics up to 3rd order, in the sense thata relatively new mixed order scheme - the xhyv hierarchy - has beenproposed independently by different people. This hierarchy can use the Bformat components up to 3rd order, but is not described by B format itself.




Short resumen:

The proposed standard is a description of mandatory and optionalelements, which can be implemented in a flexible way.

Because the standard makes heavy use of both FOA and HOA B format (butis not identical to B format in the optional part, and even less infuture extensions), I propose the name "sound field format" (.SFF) aspreliminary standard name. (IF this standard will go on in any form, youneed a name.)



I. FOA, soundfield mikes

This is the basic and historically first form of Ambisonics. For legacyreasons, because of the many existing recordings and because of thecontinuing relevance of different soundfield microphones, FOA has to beincluded into any Ambisonics based standard for surround sound/3D audio.

The original B format (1st order, so WXYZ) has been superseded by theFuMa B format scheme up to 3rd order, and the related .amb file format.The .amb format is widely used in available Ambisonics decoder, plugins,software tools etc. In this sense, B format/.amb is the de factostandards for Ambisonics up to 3rd order.


Source:

http://dream.cs.bath.ac.uk/researchdev/wave-ex/bformat.html

There is currently not any accepted lossless or lossy compressed fileformat associated with B format, at least as far as I know. (It has beenshown that AAC is a good codec to compress Ambisonics channels, andcould be applied in different/decreasing bitrates up to very high orders- if bitrate matters.)

Recent parametric or non-conventional decoders like Harpex and DirAC tryto improve the perceived resolution of FOA out of the sweet spot.


(
Further reading:

http://harpex.net/
)

Note that Harpex also provides upsampling of 1st order to 3rd order(both B format). As far as I understood, this applies at least to thehorizontal case. (Commentaries/feedback very welcome)



II. HOA

Both B format and .AMB are only defined up to 3rd order.

Here the link to some background paper about the recent and current HOAstandardization efforts, written from the perspective of the peoplebehind some standard proposal beyond 3rd order Ambisonics (AmbiX):


http://iem.kug.ac.at/fileadmin/media/iem/projects/2011/ambisonics11_nachbar_zotter_sontacchi_deleflie.pdf

Because you have to start from some point, I already have proposed tocombine "just" 1st order and 3rd order Ambisonics into one simple yetpowerful standard for home and mobile use, recording etc.

Therefore, the mandatory part of SFF (v. 0.1) is a subset of B format.(The actual standard proposal will also include certain recommendationsfor decoding, backward-compatibility to stereo, mandatory and optionalloudspeaker configurations etc.)

Because I have proposed this before, I will cite the respective formerposting (9th of September 2013):

Dear colleagues...
To continue the proposal to use < certain > forms of .AMB as areal-world format for the transport/storage of 3D audio (includingmusic recordings), I would like to hint to some further and importantissues involved.
A full .AMB decoder would have to be able to decode the nine differentcombinations of .AMB to different (standard?) loudspeakerconfigurations, and also to headphones. (The latter would be someimportant point in my "requirement list".) This means there will beplenty of combinations, and some great opportunities to mess things upif anybody wants to implement the 9*9 "or so" combinations ... :-)
It would be advantageous if we would be able to limit .AMB to some <CE profile > with far fewer combinations!
(To cover just FOA won't be enough. We know that FOA has certainlimitations and won't be good enough for all applications. Think justof the sweet spot issues.)
My impression is that you would have to use < at least > 3rd order toovercome many/most of the typical FOA problems.
Some advantages of TOA, compared to FOA:
- much larger sweet spot (not only support for individual listeners;IMO this is very important, as I would like to be able to demonstratesome wonderful recordings to "at least one" friend, even better to"some friends". If you don't have friends, don't bother... :-D )
- angular resolution significantly improved, compared to FOA(improvement of more than factor 2)
- improved performance at higher frequencies
- we know that FOA has certain problems to present sound "from thesides", even if the playback rig would include loudspeakers at directlateral positions.
http://www.acoustics.hut.fi/~ville/papers/pulkkiicmc2001.pdf
- Decoding TOA to (ITU) 5.1/7.1 will show much better results than FOAto 5.1/7.1. (Comments? I know that 5.1 is an underspecifiedirregular array from an Ambisonics TOA perspective, but you can decodethis and the results will be better than in the FOA to 5.1 case...)
- Improved behaviour at higher frequenciess
Altogether, a practical "CE" format based on Ambisonics and .AMB couldbe introduced in the following, simplified form
I) FOA/ UHJ (3-4 UHJ channels, the proposed backward-compatible formto stereo of FOA...)
3/4 channels
(Classical decoders and other decoders, supposed to improve onclassical ones...)
II) 3rd order horizontal-only and 3h/1p, which you could combine tojust 3h1p (1st order vertical)
7/8 channels,  or 8 channels
(Might still be offered in some "UHJ", stereo-compatible fashion;"UHJ" for 2nd/3rd oder doesn't exist yet, but it can be done.)
III) 3h/3p, 16 channels (call this the .AMB master format? Anyway,this is the upper end of Fu-M AMB...)

In the 1st version SFF (v0.1 - 1.0), I would restrict the < mandatory >channel number count to 8. (8 channels is still an important limit, tocomply with maximum channel numbers in formats, CE interfaces, DAWaudiobuses and workflows, etc.)

This means that you would combine I and II, referring to the citedpassage above. (III cancelled. III might be the mastering format usedin a DAW, but it is not part of SFF itself.)

I have seen later that I and II have also been used in a (well-received)audio engine for gaming:


http://etiennedeleflie.net/2008/06/24/codemasters-ups-their-useage-of-ambisonics-on-race-driver-grid/

I though you might like to know that our latest game RaceDriver GRIDdeploys hybrid third- order Ambisonics (again) on PlayStation 3, plusa lot more concurrent sound sources (at times hundreds play at once,each generating eight WXYZPQUV Ambisonic components)



(The above combination is  3h1p, in FuMA terms.)

This audio machine is related to Blue Ripple Sound and Richard Furse,who recently has introduced TOA (VST) plugins.

So, the < mandatory > subset of .amb which will be included into theSFF proposal (v0.1) will be FOA (3/4 channels) and mixed-order 3h1p TOA(7/8 channels), and nothing else.

Note that in both the 1st order (3-4 channel) and 3rd order (7-8channel) case you can use the < same > decoder, which means the samedecoder for 1st and 3rd order horizontal and 3D audio decoding. (If Iunderstood this well enough, you can't decode 3h1p via afull-periphonic 16-channel TOA decoder leaving unused channels at "0",because you would obtain certain errors. This means that differentmixed-order variants need a different decoder. The above case isspecific, because the B format z component can be ignored if you decodeto any horizontal configuration. In other words: You can ignore z ifelevated speakers - positive and negative elevation possible - don'texist in the output configuration. The common horizontal/3D decoder hasto include z in each case, of course. )

I propose that that the following mixed order variants could already beincluded into the v0.1 version of SFF, as < optional > elements:


- 2h1v (8 channels),
- 3h1v (12 channels), and/or
- 3h2p (11 channels)

(Optional means that a decoder or plugin could be able to reproducethese variants, but doesn't have to.A decoder or plugin < should not > crash if reading more than 8channels, though... This is a superfluous definition, but has to berespected in any case. So, don't sue me for any standard implementationwhich doesn't work, destroys your computer file system, or whateverother minor reason... ;-) )

At this point (optional part) we have left the backward-compatibilty toB format/.amb, including some combinations of the xhyv hierarchy. (Bformat uses xhyp for mixed orders)

xhyv is as 3D audio format more stable than xhyp, in the sense that thehorizontal resolution stays the same, independently of theelevation/height levels where the sound comes from. (As long as spatialmusic happens mostly in the horizontal plane, I would say that most toall current music recordings and most spatial performance projects couldbe "done" in xhyp. IMO 3h1p is still and clearly a better recordingformat than 2h1v, because of the higher horizontal resolution. If youhave content which makes heavy use of the full-sphere, you would have tocope. Then and only then, 2h1v might be perceived to be a better optionto present isotropic 3D audio. In this case, the reign of mighty WFSwould also come to an end, in epic failure style... )


An introduction to different mixed-order schemes can be found here:

Ch. Travis "A New Mixed-Order Scheme for Ambisonic Signals"
http://ambisonics.iem.at/symposium2009/proceedings/ambisym09-travis-newmixedorder.pdf/?searchterm=travis

(Not restricted to 3rd order)

The mandatory part of SFF is a subset of .amb < and > respetcs an upper8-channel limit. (6 channels is nowadays not a serious limit, becauseyou won't distribute B format or "SFF v1.0" via DD, DTS or S/PDIF. Thinkalso of the fact that AAC, Opus etc. offer much better compression thanDD and DTS. I don't see < any > bitrate problem for 8 channels AAC/Opuscompressed audio if 5.1 DD didn't have a bitrate problems in the 90s.)

It is good to include some optional format extensions with a higherchannel count right from the start, as suggested. At worst you wouldprepare future extensions, but you didn't "break" current software andinterfaces in the mandatory part. (I am aware that optional features arenot always implemented, because people tend to be pragmatic, vulgo"lazy". On the other hand end users/professionals/musicians will be ableto chose between different implementations. Choice means that optionalfeatures will be implemented at least sometimes IF a certain demandexists in the 1st place.)




III. Loudspeaker configurations (presentation as a sketch)

1. Mandatory support for:

Stereo, quad (4 speakers), 5.1, 7.1

Binaural for a mobile device. (Obviously. What else?)



2. Optional

I propose to put both the regular hexagon and octagon into thiscategory. (Very nice for enthusiasts and for people working in thisfield, but statistically not yet found in the wild. Can you reallyexpect that consumers will move around their 5.1/7.1 speakers?)

Binaural via loudspeakers, using X-talk cancellation. (Ambiophonics,BACCH etc.)(You could say that the people behind might adapt their technology toreproduce SFF.)



3. "New" horizontal configurations:

a) ITU 6.1

This is 5.1 + a CB (center back) speaker. This configuration didn'tsucceed for home theater use. One reason could be that the angle betweenSL/SR and CB is 80º according standard, so maybe a bit too high foreffective pairwise panning.

In the Ambisonics case, the cited panning problem doesn't exist, andthere should be some real improvement compared to the normal 5.1configuration. 6.1 is then an easy extension of 5.1, and very probablybetter for soundfield generation. (The existing home theaterinstallation is both extended and stays < unchanged >. No side effects,you only have to install one more speaker.)

Because the center back speaker might have to be closer to the listenerthan other speakers for practical reasons ("not enough space behind yoursofa"), some distance compensation/delay might be necessary.

This same plane-wave assumption makes it possible to vary the distanceof speakers within reasonable limits without upsetting the correctfunction of the decoder, provided that the difference is compensatedwith delay, the power is adjusted for uniform loudness at the center,and that per-speaker near-field compensation is used. Distance doesnot affect the decoder matrix.


(Source:
http://en.wikipedia.org/wiki/Ambisonic_reproduction_systems
)


b) "ITU 8.1"

Actually, this configuration is not a defined standard, and actuallymight not exist anywhere yet. It is ITU 7.1 plus a CB (center back)speaker. While this doesn't improve things for panned audio (unless youwant to have sharp "non-phantom signals from a back position"), thismight be a good configuration for TOA decoders. You have 8 speakers, andalso a symmetric configuration to some thought axis in the middle, goingfrom -90º to 90º. (This configuration is relatively close to an "ideal"octagon, and backward compatible to all versions of 5.1/6.1/7.1 film audio.)

The ITU 6.1 and "8.1" configurations would add just one loudspeaker toexisting configurations for film sound.



3. Loudspeaker extensions for 3D audio.

You would add (at least) 4 upper speakers, as long as SFF staysrestricted to 1st and 2nd vertical orders. 2nd order would require atleast one elevation level more. (Full periphonic 3rd order would requirean 8-6-1, for the upper hemisphere. Full 3h1p decoding requires 8-4 or8-4-1. 8-4-1 fits even for the optional 3h2p case, BTW. "8" can refer toocagon or "ITU 8.1", see above. Decoding of TOA to 5.1/6.1 is possible,even if detail is lost.)

Negative elevations are possible and have been done, in certain homeloudspeaker configurations. (Including the 7.13D configuration, if wetalk once more about gaming. )



IV. Binaural decoding

Decoding sound field format/SFF via HRTF sets via headphones, optionallyincluding HT. (Note that motion-sensors, gyroscopes and positiontracking via GPS are included in smartphones and many other devices. Seealso recent discussions on sursound.)



V. Working with SFF, mixing, cutting, editing...

All current mandatory and optional combinations of SFF v0.1 (4-12channels) could be derived from a full-periphonic 3rd order B format mix(16 channels), presented probably in .AMB format. This is why wecurrentlay stay at 3rd order/TOA, and don't aim for orders above.

(When going from 1st order Ambisonics to TOA, there is a significantimprovement of quality. The improvements are less clear if you compareSOA to FOA. If SFF or any accepted TOA format could be established as akind of 5.1 successor, any potential version after might very well skip4th order and go directly to anywhere between 5th to 7th order, toobtain some clear improvement. And probably we would use again somemixed-order scheme, maybe 5h3p or 6h3p. 20 or 22 channels sound like anawful lot compared to 5.1, but it is the factor 4. If you say that someAmbisonics channels is compressed with AAC and say 80 kbit/s eachchannel, you would (still and only) need about 1.6 MBit/s for 20 audiochannels. (I also don't see any problems to handle about 20 uncompressedchannels in a DAWs. Just think of the requirements for video editingand even Photoshop, and relax...)

8 channels can be coded easily in 640kbit/s. This is a Dolby Digitaldata rate. (The above data rate of 1.6 MBit/s would compare to 5.1 DTS,but is wayyyy better... There exist certain ideas how you could compressvery high orders even more, but this is beyond the scope of the currentSFF project.)

A far bigger problem for any application of Ambisonics >= 4th order athome than memory space or available bitrates is the fact that you willhave to use (or re-use) existing loudspeaker configurations, too. (Atleast if you don't expect that your customer will run to Wal Mart orLidl to buy plenty of cheap speakers and cables for the installation ofhis/her newest Ambisonics home system.)TOA has already been shown to work successfully in home applications,see the cited gaming audio engine in section II.

SFF is meant to be used for specific areas, especially as a home andmobile surround/3D audio format. We are not covering yet all possibleapplications/needs such as live concerts, audio installations, cinemas,theaters, operas etc. However, it would be possible to present somefuture version of SFF which will cover all those applications and areas,and might include HOA order >=4.(So, it made sense to talk about extensions to 5-7h/3p, or similarproposals, probably mixed-order variants. Any future version of SFFwould be backward-compatible to "SFF v1.0", of course. Otherwise youcould not speak about "SFF" at all)



VI. Microphones

The FOA soundfield microphones got some recent boost via the Harpexdecoders and upsamplers, which in some cases will improve results if youcompare the Harpex decoder to classical dual-band, rE or in-phasedecoders. Harpex is also very useful (I hear) if you want to mix a FOArecording into some (synthetic) HOA field. (I believe this works onlyhorizontally, for now.)

A spherical array microphone with 32 capsules can record 3rd and maybe4th order. (If I understood this well, you could introduce a versionwith 20 capsules as a simpler TOA mike. Comments?)


An excellent article about both HOA and HOA microphones you'll find here:

http://ambisonics.iem.at/symposium2009/proceedings/ambisym09-daniel-evolvingviewsonhoa.pdf/at_download/file

It seems to me that direct 3rd order recording is currently aboutfeasible. (EigenMike)



VII. Backward-compatibility to stereo

I have suggested before that FOA (1st order B format) could bedistributed in a form which is backward-compatible to stereo, using the3/4 channel form of UHJ.


http://en.wikipedia.org/wiki/Ambisonic_UHJ_format#The_UHJ_hierarchy

If a third channel (T) is available, this can be used to give improvedlocalisation accuracy to the planar surround effect when decoded via a3-channel UHJ decoder.

Adding a fourth channel (Q) to the UHJ system allows the encoding offull surround sound with height, known as Periphony, with a level ofaccuracy identical to 4-channel B-Format.

Which gives just another form of WXYZ, coded in a backward-compatiblefashion to stereo, which is LRTQ.

The LRTQ presentation could very probably be used to distribute AACstereo/FOA files. If lossless distribution is required, you mightdistribute LRTQ via FLAC. (Note that you need always a decoder for thesurround signals. LRTQ will always be brought back into B format form,if you decode to surround sound. Nobody would expect to need anAmbisonics/UHJ decoder for POSL/plain old stereo listening, unless youare a real expert and decode B format to 2 speakers. It works and youdo this maybe after recording with a SFM, but... O:-) )

This idea could be extended to higher orders, how I have suggestedbefore. A very easy way to do so has been proposed by Jörn Nettingsmeier.

Take Ambsionics B format orders 3h and (mixed-order) 3h1p. Thecomponents are WXY(Z) UV PQ.

Jörn's suggestion is to use only 1st order to derive L/R, which leads usto LRTQ UV PQ.

In the meantime, I have found some possible problem with this: The Q andZ channels are actually different!

(The Q channel is not used to derive L/R. A possible explanation couldbe in different forms of filtering. See the section presenting UHJcoding and decoding equations in the cited link:


Q = 0.9772*Z

Z = 1.023*Q

Note that two-channel UHJ requires the player to use different shelffilters than for 3/4-channel UHJ and B-Format);

So, according to Wikipedia, there is actually no different filter if youcompare 3/4 channel B format and UHJ format.

Why are Q and Z then different at all?! Note also that the decodingequations for WXY are exactly identical for 3- and 4-channel UHJ.


(Comments/help very welcome)

Because Q and Z are not identical, there might be some problem with our"naive" proposal. (This problem looks very fixable, but where are ye UHJexperts who show us how to extend UHJ to infinite orders, using Jörn's"don't care for higher orders if deriving L/R" (TM) approach? ;-) )

The hierarchical stereo-surround UHJ presentation of SFF saves not onlymemory space or transmission bitrate (you don't have to distribute astereo file and a surround file), it also might save money in case thatcertain right holders would ask for double license fees. Having workedin the field of optical discs and having proposed (different) hybriddiscs, I would suspect this would be the normal case, unless recentlythings have changed a lot in the music label and publishing rights world.

One file instead of two is also a desirable simplification, from aconsumer perspective.



VIII. So what?

What is suggested is a relatively simple standard for (mainly) 1st and3rd order Ambisonics.

SOA (2nd order Ambisonics) is skipped (with the exception of 2h1v asoption), specifically 2h and 2h1p. (Going up to 3rd order provides someclear improvements compared to FOA, such as the perfect or saynear-perfect reconstruction of a much wider frequency range in the"head zone", a wider sweet spot/zone and improved localization for allfrequencies out of the sweet spot.)

Considering the wide use of B format in existing software, plugins,decoders and software toolboxes/libraries, TOA is clearly preferable to4th order, from the view of a software engineer/developper.

The SFF standard proposal is built around existing solutions whereverpossible, including some new elements. (Backward-compatibilty to stereo,6.1 and "8.1" loudspeaker configurations, xhyv hierarchy...All theseelements are < optional >, which is intentional. Concerning the use ofthe UHJ hierarchy to obtain easy backward-compatibility to stereo, Isuggest that these transformations can be done in an easy way, so Istrongly recommend to actually implement/use this feature. If this wouldnot be some own recommendation, I would have said this should be a <mandatory > part. But so... 8-) )

Although the channel count of 8 is from a current perspective moderate,3rd order Ambisonics is powerful. Maybe VBAP rendering via 8 speakerswould give the same accuracy of localization compared to (3rd order)Ambisonics rendering, but I doubt that Ambisonics would test worse.Beside of this, Ambisonics is loudspeaker independent, and we have somereal and flexible surround/3DA format. (Whereas VBAP is a renderingtechnology, not a format.)

Two optional forms of SFF (3h2p and 3h1v) use 11 and 12 channels,respectively. These are thought to improve applications which wouldstress the 3D audio aspect. 12 channels is about the same as the channelcount for Auro-3D, which in it's most normal form is 11.1. (Auro-3D is asimple discrete format for 3D audio, so the comparison of SFF to both7.1 and 11.1 seems fair. You/we can test later...)

This has been a long posting. However, it is about some standardproposal, and in request for comments form. Unlike like in any finalstandard description, I also wanted to provide some < reasons > why Ipropose certain features/elements, and some real discussion base. 5.1won't go away, but the proposal (sound field format/SFF) is clearlyboth more powerful and flexible. It is also more complicated, we have toadmit. (It is an open standard, in this sense different than Mpeg-H 3DA.Without being able to give any proof because you can't prepareunfinished states, I believe that SFF could/will be a lot simpler thanMpeg-H 3DA.)

Special thanks go to Michael Chapman, Aaron Heller, Jörn Nettingsmeier,Richard Furse and Svein Berge. You have answered both rather competentand exceptionally ignorant questions, which for my feeling andconsidering my limited capabilities still made a lot of sense O:-) ...(The SFF proposal and all expressed views are just my own, althoughevery of the cited persons had some influence on my views. I amcompletely unaware if any of the cited persons agrees with any of myviews... )

Speaking about Harpex, you could see this as a kind of bridge between1st and 3rd order Ambisonics. (Harpex might actually be judged to beeven "sharper" than 3rd order under some circumstances, which willdepend on the decoded material. In general, "native" 3rd order shouldprove to be more natural. Feedback welcome...)

We have also discussed if parametric decoders (like Harpex or DirAC)could be extended to HOA and specifically 2nd/3rd order, whichtheoretically should be possible. However, this doesn't seem to havebeen done yet.

(Comments...)

An open standard doesn't mean that commercial solutions which wouldimplement/use SFF should not be welcome.

The proposed ("home", "CE", whatever...) standard SFF = Sound FieldFormat should be controlled by the community (and/or possibly by certainopen organisations/foundations), not by companies.

Some attention has been given to the application and decoding side,including < mandatory > and < optional > loudspeaker configurations andbinaural decoding options. Whereas some decoders (such as Rapture3D)would offer free speaker positioning - which is impressing -, this isnot the normal case, nor < should > all decoders implement such afeature. It is obvious that any "flexible" decoder could pre-install allmandatory and optional configurations in an easy and selectionable way.



Best regards,

Stefan Schreiber                                            Lisbon



---------------------------------------------------------------------------------------------

Further related thread:


-------- Original Message --------

Subject: [Sursound] Two new approaches for the distribution of surroundsound/3D audio

Date:   Mon, 29 Jul 2013 03:57:34 +0100
From:   Stefan Schreiber <st...@mail.telepac.pt>
Reply-To:       Surround Sound discussion group <sursound@music.vt.edu>
To:     Surround Sound discussion group <sursound@music.vt.edu>



(Continuation of: The commercial future of Ambisonics, 15/5/2013)


Dear colleagues,

following the recent standardization of 3D audio by Mpeg (ISO/IEC23008-3) and related activities, I have come to the conclusion that the(older) B format up to 3rd order might need some updates.

However, I also came to the conclusion that FOA (first orderAmbisonics) could be easily included into all current distributionmodels for audio in the Internet, which are (to "99.98%") stereo-based.We nearly have been "there", in the above cited thread! ("Thecommercial future of Ambisonics")

I will start with this part, because you can see this as an own format.Which might be the perfect bridge or transition format for futuresurround/3D audio (3DA) formats...



....


IV. Improved binaural representation via headphones

Note that headphones with HT "chips" and motion-corrected binauralplayback of surround sound (including 3D audio) could easily berealized, with available and actually quite affordable chips.

Oculus Rift is the direct example for this, as this is a full (andcertainly more complex) VR and gaming device.


http://worthplaying.com/event/E3_2013/PostE3_2013/89888/

In its current state, the Oculus Rift is an amazing piece of work, andafter decades of dealing with VR technology, it seems that we mayfinally see a VR unit that is going to get it right.



Wikipedia writes about the Oculus Rift motion-tracking:

"Initial prototypes used a Hillcrest 3DoF head tracker that is normally120 Hz, with a special firmware that John Carmack requested which makesit run at 250 Hz, tracker latency being vital due to the dependency ofvirtual reality's realism on response time. The latest version includesOculus' new 1000 Hz Adjacent Reality Tracker that will allow for muchlower latency tracking than almost any other tracker. It uses acombination of 3-axis gyros, accelerometers, and magnetometers, whichmake it capable of absolute (relative to earth) head orientationtracking without drift.[20][25]"

Now, apply the same or similar HT silicon (which is already veryaffordable) to HT/motion-tracking headphones... (I could give somedetailled recommendations how to do this, but this is also one of thenext steps... Nice to see that at least the video and gaming people havekept some sense for cool technology and seemingly "weird" ideas, so tospeak. How many motion updates per second would a "fluent" head-trackingbinaural decoder/decoding program actually requiere? Ye Ambisonicsexperts, what do you think or better know?!You would have to decode some UHJ/".AMB+" file and shift the soundfieldrelative to the head position, I guess. The head position needs someregular and frequent updates, what we easily get b y now. You could alsotrack the < absolute > movements of persons within some area. Say: Yourdecoder program tracks the movements of the visitors in some museum orbuilding, and plays the associated audio/explanations fitting to thecurrent position. "This is the dining hall of the castle, which wasquite cold during winter, but warm or even hot during summer." Ok, thiswas a truly dull example... :-) )



Best regards

Stefan Schreiber                                          Lisbon



_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://mail.music.vt.edu/mailman/private/sursound/attachments/20140108/aaca7542/attachment.html>
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

[Sursound] Proposal for a (relatively) simple FOA/HOA standard. Request for comments!

Reply via email to