Re: [Sursound] A proposal for an Ambisonics based 3D audio codec, MPEG/ITU style...

Stefan Schreiber Sun, 27 Jan 2013 11:19:07 -0800

Dear colleagues...

I take the opportunity to inform you about recent work at the MPEG,presenting updates and (by documents backed) official information.

1.http://www.itu.int/net/pressoffice/press_releases/2013/01.aspx#.UQVoNVK-Zkj

ITU-T’s Study Group 16<http://www.itu.int/en/ITU-T/studygroups/2013-2016/16/Pages/default.aspx>has agreed first-stage approval (consent) of the much-anticipatedstandard known formally as Recommendation ITU-T H.265 or ISO/IEC23008-2. It is the product of collaboration between the ITU VideoCoding Experts Group (VCEG) and the ISO/IEC Moving Picture ExpertsGroup (MPEG).

ITU-T H.265 / ISO/IEC 23008-2 HEVC will provide a flexible, reliableand robust solution, future-proofed to support the next decade ofvideo. The new standard is designed to take account of advancingscreen resolutions and is expected to be phased in as high-endproducts and services outgrow the limits of current network anddisplay technology.
Companies including ATEME, Broadcom, Cyberlink, Ericsson, FraunhoferHHI, Mitsubishi and NHK have already showcased implementations ofHEVC. The new standard includes a ‘Main’ profile that supports 8-bit4:2:0 video, a ‘Main 10’ profile with 10-bit support, and a ‘MainStill Picture’ profile for still image coding that employs the samecoding tools as a video ‘intra’ picture.

The ITU/ISO/IEC Joint Collaborative Team on Video Coding (JCT-VC)(formerly JVT) will continue work on a range of extensions to HEVC

2. Obviously, the future standard for 3D audio belongs to the samestandard (or standard family) ISO/IEC 23008. The audio "part" is ISO/IEC23008-3, better known as MPEG-H part 3.

3. Here a "global" update on MPEG standard, presented by LeonardoChiariglione himself. (Paper date: 26th June 2012)


http://www.ieee-bmsb2012.org/images/program/BMSB2012_Keynote_2_MPEG_standards_supporting_the_evolution_of_broadcast_media_Leonardo.pdf


Updates:

- p. 10 presents the intended timeline for the standardization process,including 3D audio.

New: We might have a bit more time than I have thought, which is goodnews. "CD" stands for committee draft. I guess they will start on thefirst working draft in Vienna, summer 2013. (I have posted the scheduleof next MPEG meetings, within this thread.)


- p. 34ff: "No systems and video without audio"...

- p. 35-37: 3D audio requirements

Judge yourself if Ambisonics would fit to fulfill this requirement list.(A clear yes "here", if I have anyting to say about...)


- p. 38 "Envisioned architecture"

Ambisonics is actually included, as "audio scene" input into the firstencoder.


BUT: This is still a very crude scheme, not anything eleaborated.

- The two "encoder" stages need some explication, or elaboration. Icould imagine what they mean, more or less. However, as an acitecturaldiagram this is either a bit confused, or maybe "heavily underspecified".

- Indeed, the 500 bit/s for 3D audio seems to be a bit bit-starved, whatMr. Chiariglione actually seems to admit himself. (p. 42)

Parametric compression might achieve this rate, but how to reconcilethis with the requirement list? ("quality of decoded soundperceptionally transparent"; note also that we are talking about UHDtelevision and cinema audio, in this case, do it well?)

- The "64 kbit/s" stream to mobile phones/devices seems to be completelyoutfashioned, in a time of readiy available UMTS, HSPA , LTE etc.)

Stay at 128kbit/s or 256kbit/s, as anybody else in this industry?(Please don't be worse than any YouTube stream, ok? Just speaking forthe new generation, which righteously expects some minimum quality overtheir handy earbuds...)


Nobody will listen to 3D audio on GSM phones, pretty safe bet.

"Smart phone TV" will be more like "video transmitted via Internet","classical broadcast TV" probably included but hardly the main thing...



- p.39: "Immersive and enveloping audio expeience"

"Immersive" should mean "high quality", not "cheating" in a race for thelowest possible bitrate.



-Home theater: p. 41ff

The presented .AMB based proposal, 3rd horiz. /2nd vertical order (asoption including 3 or 5 separate front speakers) seems to besufficiently strong to be decoded to this "typical" but "high end"loudspeaker configuration.

Would also need significantly less transmission rate than 22.2, whichseems to be a factor here. (Maybe they will take 1,5 MBit/s as upperrate limit, have a look at p. 42)



- p. 46: "Flexible rendering"

I leave the rest to your own interpretation, and good judgement. I hopethere will be a little bit more support coming from this list, which isheavily involved into Ambisonics reasearch and practical implementation.And I fully believe that sursound is a very competent list/place...



Best regards,

Stefan Schreiber

P.S.: Further updates on the Shanghai and Genève meetings heavilyneeded, IMO.

Mr. Pallone et al., the CfP is issued by now, and it is time to present"something". Anyway, I have signed up to the Mpeg list, but couldn't seetheir CfP yet.

(I hope I don't have to hack into their server, which might lead toarrest warrants and a further life spent in prison. I really can'tafford to do THIS... :-D )


P.S. 2: (preemptive legal declaration... ;-) )

If anyone hacked into the MPEG server to steal a document which (intheory) should be publicly available, it wasn't me!!!!!!!!! I have achild, life...

(I am aware that I would not be jailed for stealing a publicly availabledocument, just for the taken measures to get into possession of it. Tohave done this for the sake of science and "better 3D audio" would be alame duck excuse if convicted, ok...


BUT: YOU have to prove that I did it, and I didn't. :-) )


Stefan Schreiber wrote:

Dear colleagues...
I would like to remember everybody interested or already beinginvolved that ITU/MPEG plan to define and issue some 3D audio standard(better: 3D audio standard framework) during this year. The 3D audiocodec is meant to be part of the (wider) MPEG-H standard.
This all makes a lot of sense, 'cos ;-) there is already somecompetition around:
1. Hamasaki 22.2, well known as (audio) part of former UHDTV (SuperHi-vision) proposals.
2. http://www.auro-3d.com/system/listening-formats

(Note:

a)
The Auro-3D® Engine comprises:
Auro Codec: The revolutionary codec that delivers native, discreteAuro-3D® content.
Auro-Matic: The groundbreaking up-mixing algorithm that convertslegacy content into the Auro- 3D® format.
Auro-3D® Headphone: Like other audio configurations, similar resultscan be achieved with headphones that use binaural technology.
b)
Film, Broadcast, Gaming, Mobile, Automotive and Multimedia industriesare all searching for a next generation sound format. With 3DStereoscopic imagery becoming commonplace, the time is right for anaudio experience that matches this increased level of fidelity. Soundin 3D is clearly the next step.
3. http://www.dolby.com/us/en/consumer/technology/movie/dolby-atmos.html
(IMHO, Dolby won't participate in the MPEG standardization process.And even if, Dolby Atmos seems to be finished.)
The current situation at MPEG:

http://www.itu.int/en/ITU-T/studygroups/com16/video/Pages/jctvc.aspx

Next meetings:
* Geneva, Switzerland, October 2013 (tentative)
* Vienna, Austria, 27 July - 2 August 2013 (tentative)
* Incheon, Korea, 20-26 April 2013 (tentative)
* Geneva, Switzerland, 14-23 January 2013 (tentative)
During the next conference (January, Genève), the important HEVC codecshould be technically finished. (Status: FDIS, for "Final DraftInternational Standard")
There will also be issued a final call for an 3D audio codec:
At the 102nd MPEG meeting MPEG has issued a Draft Call for Proposals(CfP) on 3D Audio Coding.
(This was the last meeting, Shanghai, October 2012)
MPEG-H 3D Audio is envisaged to provide a highly immersive audioexperience to accompany the highly immersive experience provided byMPEG-H HEVC. Such an immersive listening experience will be realizedby the rendering of a realistic and compelling 3D audio scene eitherby using a large number of loudspeakers, such as for 22.2 channelaudio programs, or by using headphones supporting binauralization.Key issues to be addressed are a compact and bit-efficientrepresentation of multi-channel audio programs and the ability toflexibly render an audio program to an arbitrary number ofloudspeakers with arbitrary configurations. 3D Audio support viaheadphones is also a key capability in order to deliver an immersiveexperience for users of mobile devices.A final CfP will be issued at the 103rd meeting in January 2012,
(they mean January 2013, of course...)
with selection of technology from amongst the responses received atthe 105th meeting in July 2013. This technology will form the basisfor MPEG-H 3D Audio, the Audio part (Part 3) of the MPEG-H (ISO/IEC23008) suite of technologies.
Taken together, the final deadline for any proposal seems to be aroundApril 2013. (Incheon, Korea meeting, April 2013)
If some Ambisonics based audio-codec is proposed (it has been done,but as an official proposal??), I would like to add some observations.
Cinema audio and UHD TV (and this is where the push comes from) icludesome "discrete" elements, and anybody has to be aware of this.Firstly, there are one or two (Hamasaki 22.2) separate LFE channels.(LFE channels make sense for movies and in the cinema, even if somepeople always will dispute this...we are not talking about most musicyou will listen to at home, but about cinema sound with special effects.)
Secondly, a lot of sound is tied to the screen. The narrow-spacedfront speakers might represent a problem for Ambisonics, at least forlow-order Ambisonics. (Dolby Atmos defines actually up to 5 "screen"loudspeakers, this means three or five. Note that the front C channelis often used as voice/conversation channel.)
A possible solution would be to offer some kind of B"+" option, the"plus" part being the front and LFE channels. 2D/3D surround for allthe "resting" sound field would be offered via the B format (order?)sound field, or HOA sound field. (To mix such a hybrid sound format israther trivial, I would say. Just leave out the front and the LFEparts in the surround/3D field... )
So maybe define some "purist" solution (say B format 3rd order, orhorizontal 4th order mixed with vertical 1st/2nd order, or whatever),and also some "B+" option. (The original B+ proposal was FOA + 2stereo channels. Note that a direct consequence of the "hybrid"Ambisonics option would be that a 2nd or 3rd order soundfield shouldbe enough for the representation of the surround and height channels.In fact, you can decode to 5.1, 7.1 Hamasaki 22.2, Auro-3D and DolbyAtmos surround layouts. The B format "resolution" should be more thanenough for any of these layouts - maybe even at 2nd order, certainlyat 3rd. The narrow-spaced front wouldn't be any problem, bydefinition. LFE channels are discrete in any case, as stated before.)
I would't be afraid to offer some hybrid option, anyway. (Dolby Atmosdefines up to 64 channels, and also audio objects for differentloudspeaker layouts. Therefore, Dolby Atmos is itself a hybrid system- based on discrete channels and audio objects.)
I just wanted to give a small hint ;-) how anybody might set up avalid proposal. The B+ could and < should > be included as an option.The basic idea behind for this is that cinema audio has some specificproperties, which have to be covered by any system. (The < front > isextremely important, because voice and many sounds are tied to eventson the screen; LFE channels are discreet; the C channel is mostly usedin a discreet way, being used as the voice channel.)
Note also that the clock is already ticking, and I absolutely meanthis. The MPEG can chose from some valid proposals, (Hamasaki) 22.2and Auro-3D among these.
Ambisonics is defining a 3D audio field since the 70s, so it wouldseem logic to include Ambisonics into any 3D audio standard. There arealso some clear advantages, which are getting more and more important.(Different cinemas won't offer anywhere the same loudspeaker layouts,pretty safe bet)
Because the MPEG will basically chose from existing proposals,somebody has to define some valid Ambisonics based proposal.
I am apologizing to the already involved experts to have written on apretty basic or say introductory level. But nobody has done this herebefore, and I think not everybody is sufficiently informed about theseissues - maybe even some very competent people.
However/but:
The next two or three MPEG conferences are not just like the nextspacial audio or Linux audio conference ;-) , we are talking (also)about the probable next real-world standard for 3D audio. After MPEG-H3rd part (audio) and Dolby Atmos exist, every future endeavour wouldface some (extremely difficult) uphill battle.
If Ambisonics is not included for reasons of laziness, infightingtribes or whatever else, I would say: Game over for Ambisonics in thereal-world.... (I don't mean this in a rude way. The thing is justthat the MPEG won't wait for even the most beautiful HOA standardwhich will be represented in the year 2015 or 2020...)
The advantages of Ambisonics are clear: It is by definition a 3D audiotheory/codec, and you can decode to different loudpeaker layouts andheadphones. (This is of course very basic, but you have to tell thisto people if presenting a proposal.)
Best regards,

Stefan Schreiber Lisbon
P.S.: I personally would/will work with any 3D audio standard. BecauseMPEG-H 3rd part (audio) will be a selection of severalcodecs/approaches, Ambisonics should be included. If so, I woulddefine two options: some "purist" approach, but also some "B+"approach, which maybe fits more to cinema-audio in the real world.
Now Thomas Chen (still lurking on this list?) would probably agree,because the original (6-channel) B+ proposal if from him.Unfortunately he works at Dolby... :-X
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound


_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

Re: [Sursound] A proposal for an Ambisonics based 3D audio codec, MPEG/ITU style...

Reply via email to