Dear Bernard,

Thank you a lot for your comments. Please, find my clarifications in place. I 
hope you find them helpful.

--
Best regards,
Alexey Filippov 

-----Original Message-----
From: Bernard Aboba via Datatracker [mailto:nore...@ietf.org] 
Sent: Friday, May 24, 2019 9:46 PM
To: tsv-...@ietf.org
Cc: draft-ietf-netvc-requirements....@ietf.org; video-codec@ietf.org; 
i...@ietf.org
Subject: Tsvart last call review of draft-ietf-netvc-requirements-09

Reviewer: Bernard Aboba
Review result: Not Ready

This document has been reviewed as part of the transport area review team's 
ongoing effort to review key IETF documents. These comments were written 
primarily for the transport area directors, but are copied to the document's 
authors and WG to allow them to address any issues raised and also to the IETF 
discussion list for information.

When done at the time of IETF Last Call, the authors should consider this 
review as part of the last-call comments they receive. Please always CC 
tsv-...@ietf.org if you reply to or forward this review.

Summary
----------
Overall, this document seems more focused on the requirements for development 
of codecs such as H.264 than on the requirements that would enable widescale 
adoption of a next generation codec.  In practice, requirements reducing the 
fragmentation of implementations (such as a requirement that a compliant 
decoder be able to decode anything that an encoder can send) have proved 
critical to success, yet this document omits them.  Also, the document appears 
focused on video technology as of 4-5 years ago, rather than the technology 
used in today's streaming and video conferencing services where support for 
scalable video coding (and advanced modes such as K-SVC) has become critically 
important.

[AF] This document was written to be tool-agnostic and as less restrictive as 
possible but to cover the needs of a wide range of applications. The 
requirement of spatial and quality scalability were discussed during NETVC 
meeting and on the NETVC mail-list several times to work out an acceptable 
formulations. 

Issues
------

Section 2.1

   Video material is encoded at different quality levels and different
   resolutions, which are then chosen by a client depending on its
   capabilities and current network bandwidth....

   o  Scalability or other forms of supporting multiple quality
      representations are beneficial if they do not incur significant
      bitrate overhead and if mandated in the first version.

[BA] The words "are beneficial" suggests that support for scalability is 
optional.  In practice, support for both temporal and spatial scalability has 
proved to be important since it has been widely adopted in dynamic streaming 
applications, in which the video material to be encoded once and played back at 
 framerates, resolutions and quality levels dependent on network conditions and 
the characteristics of the endpoint devices.
[AF] Of course, it is important to support resolution and quality scalability 
if it doesn't adversely affect compression performance. Always, it is a 
trade-off. We state requirements for a codec but do not describe a particular 
codec's architecture. 


Section 2.5

[BA] This section does not mention support for screen content coding tools. 
Given that these tools are so effective in reducing the bandwidth required for 
application sharing (compression of 75 percent is common), it is hard to 
imagine a next generation codec that would not support screen content coding.
[AF] This section does not mention support for screen content coding tools 
since their absence will harm compression performance of a codec. So, if these 
tools are not used in a codec, it can't be competitive as compared with other 
codec. It will become apparent while testing it (by the way, the testing draft 
contains screen content materials). On the other hand, we shouldn't insist on 
support specific screen content tools that would restrict the freedom of codec 
developers.

Section 2.6

Support for K-SVC modes has turned out to be important for game streaming, 
since these modes reduce delay spikes that would otherwise result from 
generation of a key frame.  Since K-SVC modes have unusual characteristics 
(e.g. frames within a single temporal unit may not share the same temporal ID), 
they impose unique requirements on a video codec design.

   3.2.3. Complexity:

   o  Feasible real-time implementation of both an encoder and a
      decoder supporting a chosen subset of tools for hardware and
      software implementation on a wide range of state-of-the-art
      platforms.

[BA] This sentence seems to imply that the tools supported in hardware and 
software might be different.  In practice, this is problematic, particularly if 
support for some tools can be omitted at lower profile levels, because 
application developers then need to handle the disparities between tools 
support in different implementations.
[AF] No, this sentence implies that a codec should be implementable in real 
time, at least, with a subset of its tools. Some non-normative (i.e., 
encoder-side tools such a 2-pass encoder) and normative tools can be skipped to 
enable real-time implementation on the majority of platforms.

   3.2.4. Scalability:

   o  Temporal (frame-rate) scalability should be supported.

[BA] In practice, a next generation video codec also needs to support spatial 
scalability as well as temporal scalability.
[AF] Again, a trade-off between single-layer and multi-layer codecs should be 
selected based on concrete RD-curves. It's well known that introducing temporal 
scalability doesn't harm compression performance. However, spatial scalability 
can do that. So, different decisions on presence of this scalability type are 
possible subject to architecture and compression performance of a codec.

   3.2.5. Error resilience:

   o  Error resilience tools that are complementary to the error
      protection mechanisms implemented on transport level should be
      supported.

   o  The codec should support mechanisms that facilitate packetization
      of a bitstream for common network protocols.

[BA] Both of these points require more elaboration.  What error resilience 
tools as are being referred to, and what mechanisms are perceived to facilitate 
packetization?  Is the latter referring to video codec syntax (e.g. NAL unit 
structure?).
[AF] >What error resilience tools as are being referred to...
Any error resilience that can provide additional benefits as compared to 
mechanism implemented on transport level. A set of error resilience tools can 
be different for different codecs (e.g., for wavelet-based codecs and H.26x).
> what mechanisms are perceived to facilitate packetization?  Is the latter 
> referring to video codec syntax (e.g. NAL unit structure?)
Yes, for example, NAL unit structure

   o  The codec should support effective mechanisms for allowing
      decoding and reconstruction of significant parts of pictures in
      the event that parts of the picture data are lost in
      transmission.

[BA] Not sure what this is referring to either.
[AF] In this statement, we meant tools like entropy coding (e.g., CABAC). If 
CABAC is not reset for each picture / slice / tile, loss of a packet related to 
a given picture makes impossible to restore all next pictures. So, the 
frequency of CABAC resets should be chosen in order, on the one hand, to avoid 
damaging compression performance and, on the other hand, to allow "decoding and 
reconstruction of significant parts of pictures in the event that parts of the 
picture data are lost in transmission."

   3.3.2. Scalability:

   o  Resolution and quality (SNR) scalability that provide low
      compression efficiency penalty (up to 5% of BD-rate [12] increase
      per layer with reasonable increase of both computational and
      hardware complexity) can be supported in the main profile of the
      codec being developed by the NETVC WG. Otherwise, a separate
      profile is needed to support these types of scalability.

[BA] Mixing support for scalability with profile negotiation leads to 
implementation balkanization that dramatically increases the complexity of 
application development.  A better principle is that a compliant decoder should 
be able to decode any bitstream that an encoder can send.
[AF] In the paragraph you cited, we meant a very simple thing: if the penalty 
in compression performance for resolution and quality (SNR) scalability is not 
that high, it makes sense to support them in the main profile. Otherwise, a 
separate profile is needed (as done in H.264/SVC or in H.265/SHVC).
> A better principle is that a compliant decoder should be able to decode any 
> bitstream that an encoder can send.
It's very arguable what of the ways are better. It's absolutely mandatory to 
make a base layer decodable using any decoder (even such one that does not 
support a scalable profile if any). Not sure about enhancement layers. 
Moreover, some wavelet-based codecs (e.g., SPIHT or EZW) do not use the concept 
of layers at all. The paradigm of scalability is more natural there than in 
codecs that exploit the "hybrid video coding" paradigm (e.g., H.26x).

_______________________________________________
video-codec mailing list
video-codec@ietf.org
https://www.ietf.org/mailman/listinfo/video-codec

Reply via email to