Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-20 Thread Ian Hickson
On Sun, 10 Apr 2011, Silvia Pfeiffer wrote:
 On Fri, Apr 8, 2011 at 4:54 PM, Ian Hickson i...@hixie.ch wrote:
 
  What is a main media resource?
 
  e.g. consider youtubedoubler.com; what is the main resource?
 
  Or similarly, when watching the director's commentary track on a 
  movie, is the commentary the main track, or the movie?
 
 I don't think youtubedoubler.com is the main use case here. In the 
 youtubedoubler.com use case, you have two independent videos that make 
 sense by themselves, but are only coupled together by their timeline.
 The cases that I listed above, audio descriptions, sign language video, 
 and dubbed audio tracks, make no sense by themselves. They are produced 
 with a clear reference to one specific video and its details and could 
 be delivered either as in-band tracks or as external files. From a 
 developer and user point of view - and in analogy to the track element - 
 it makes no sense to regard them as independent media resources. They 
 all refer to a main resource - the original video.

I don't know which is the main use case; I wouldn't be surprised if 
sites like youtubedoubler.com had as many if not more viewers than those 
with sign language videos. In any case, we have to handle both.

My point was just that there isn't a well-defined main media resource.


 However, there are more similarities between audio, video and text 
 tracks than one might think.
 
 For example, it is possible to want to have multiple video tracks and 
 multiple text tracks rendered on top of a single video rendering area, 
 and they may all be explicitly positioned just like positioned captions 
 and they may all need to avoid each other. So, it could make sense to 
 include them all in a single rendering approach.

One could say the same about div. It seems like a bit of a superficial 
similarity.

Similarities between audio and video tracks and text tracks are only 
really interesting here if they're not also similarities that apply to 
other even more unrelated things.


 Another example is that you may have a audio track with different 
 captions to the captions of a related video element. Since the audio 
 track has no visual display, its captions are not rendered, but the 
 video's captions are rendered. Now, how are you going to make its 
 captions available to the video's display area when the linked audio 
 track is activated?

Do you have a concrete example of this? I'm not sure I really follow.


 Some things will inherently be harder by taking the approach of separate 
 video and audio elements rather than the track approach.

I don't really see how this particular example relates to the issue of 
audio/video tracks being treated similarly or differently than text 
tracks. I agree that the described behaviour might need some tweaks to 
handle properly, but I don't think those tweaks would involve making the 
handling of audio/video tracks and text tracks more similar to each other.


  On Mon, 28 Mar 2011, Silvia Pfeiffer wrote:
 
  We haven't allowed caption tracks to start with a different 
  startTimeOffset than the video, nor are we allowing to give them a 
  different playbackRate to the video.
 
  It's relatively easy to do it for text tracks: you just take a text 
  track and recreate it with different timings (something that can be 
  done in a few lines of JavaScript given the API we expose). So there's 
  no need for it to be explicit.
 
  For synchronising video and audio, we should expose multiple 
  tracks starting at different offsets because it is easy to achieve yet 
  provides numerous opportunities for authors. For example, it's not 
  uncommon to want to compare two movies which have similar moments; 
  showing such similarities would require either video editing or, if we 
  allowed offsets, could be done merely by pointing to two movie files 
  with appropriate offsets.
 
 It is not any more difficult to change the startTime of a video element 
 in JavaScript than it is to change the start time of a track resource.
 
 Also, I believe that your use case can more easily be satisfied with 
 temporal media fragment URIs, which not just get the offset, but the 
 section from start to end that people are comparing.

I don't follow.

However, note that at the moment the MediaController feature doesn't 
support arbitrary offsets of audio/video tracks.


  Tracks in a multitrack resource (no matter if in-band or external 
  files) are rather tightly authored to cover the exact same timeline 
  in my experience.
 
  Sure. But it would be silly to only support one use case when with 
  minimal effort we could support a vastly greater number of use cases, 
  including many we have not yet considered.
 
  This is one of those situations where not supporting something 
  actually requires more API complexity than supporting it. We are 
  rarely faced with such an opportunity.
 
 I don't want to solve use cases that we haven't thought about yet.

I don't want to add 

Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-20 Thread David Dailey
Silvia Pfeiffer and Ian Hickson exchanged:

 Yes, and the same (lack of definition) goes for javascript manipulation. 
 It'd be great if we had the tools for manipulating video and audio 
 tracks (extract/insert frames, move audio snippets around). It would 
 make A/V editing - or more creative uses - really easy in HTML5.

That's a use case we should investigate in due course, but I think it's 
probably a bit early to go there.

When we do get around to it, it would be nice, as well, to be able to create
sounds (as from wave forms) from scratch, in the browser.

Cheers
David




Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-20 Thread Silvia Pfeiffer
That actually was a quote form Jeroen and Ian, not me. :-)
S.

On Thu, Apr 21, 2011 at 10:31 AM, David Dailey ddai...@zoominternet.net wrote:
 Silvia Pfeiffer and Ian Hickson exchanged:

 Yes, and the same (lack of definition) goes for javascript manipulation.
 It'd be great if we had the tools for manipulating video and audio
 tracks (extract/insert frames, move audio snippets around). It would
 make A/V editing - or more creative uses - really easy in HTML5.

That's a use case we should investigate in due course, but I think it's
probably a bit early to go there.

 When we do get around to it, it would be nice, as well, to be able to create
 sounds (as from wave forms) from scratch, in the browser.

 Cheers
 David





Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-20 Thread Chris Double
On Thu, Apr 21, 2011 at 12:31 PM, David Dailey ddai...@zoominternet.net wrote:

 When we do get around to it, it would be nice, as well, to be able to create
 sounds (as from wave forms) from scratch, in the browser.


There's experimental work being done on this. For example:

https://wiki.mozilla.org/Audio_Data_API
http://chromium.googlecode.com/svn/trunk/samples/audio/specification/specification.html

Chris.
-- 
http://www.bluishcoder.co.nz


Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-18 Thread Jeroen Wijering
Hey Ian, all,

Sorry for the slow response .. 

 There's a big difference between text tracks, audio tracks, and video 
 tracks. While it makes sense, for instance, to have text tracks 
 enabled but not showing, it makes no sense to do that with audio 
 tracks.
 
 Audio and video tracks require more data, hence it's less preferred to 
 allow them being enabled but not showing. If data wasn't an issue, it 
 would be great if this were possible; it'd allow instant switching 
 between multiple audio dubs, or camera angles.
 
 I think we mean different things by active here.
 
 The hidden state for a text track is one where the UA isn't rendering 
 the track but the UA is still firing all the events and so forth. I don't 
 understand what the parallel would be for a video or audio track.

The parallel would be fetching / decoding the tracks but not showing them to 
the display (video) or speakers (audio). I agree that, implementation wise, 
this is much less useful than having an active but hidden state for text 
tracks. However, some people might want to manipulate hidden tracks with the 
audio data API, much like hidden text tracks can be manipulated with javascript.

 Text tracks are discontinuous units of potentially overlapping textual 
 data with position information and other metadata that can be styled with 
 CSS and can be mutated from script.
 
 Audio and video tracks are continuous streams of immutable media data.
 
 
 Video and audio tracks do not necessarily produce continuous output - it is 
 perfectly legal to have gaps in either, eg. segments that do not render. 
 Both audio and video tracks can have metadata that affect their rendering: an 
 audio track has a volume metadata that attenuates its contribution to the 
 overall mix-down, and a video track has matrix that controls its rendering. 
 The only thing preventing us from styling a video track with CSS is the lack 
 of definition.

Yes, and the same (lack of definition) goes for javascript manipulation. It'd 
be great if we had the tools for manipulating video and audio tracks 
(extract/insert frames, move audio snippets around). It would make A/V editing 
- or more creative uses - really easy in HTML5.

Kind regards,

Jeroen



Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-12 Thread Eric Carlson

On Apr 11, 2011, at 5:26 PM, Ian Hickson wrote:
 On Mon, 11 Apr 2011, Jeroen Wijering wrote:
 On Apr 8, 2011, at 8:54 AM, Ian Hickson wrote:
 
 There's a big difference between text tracks, audio tracks, and video 
 tracks. While it makes sense, for instance, to have text tracks 
 enabled but not showing, it makes no sense to do that with audio 
 tracks.
 
 Audio and video tracks require more data, hence it's less preferred to 
 allow them being enabled but not showing. If data wasn't an issue, it 
 would be great if this were possible; it'd allow instant switching 
 between multiple audio dubs, or camera angles.
 
 I think we mean different things by active here.
 
 The hidden state for a text track is one where the UA isn't rendering 
 the track but the UA is still firing all the events and so forth. I don't 
 understand what the parallel would be for a video or audio track.
 
 Text tracks are discontinuous units of potentially overlapping textual 
 data with position information and other metadata that can be styled with 
 CSS and can be mutated from script.
 
 Audio and video tracks are continuous streams of immutable media data.
 
  Video and audio tracks do not necessarily produce continuous output - it is 
perfectly legal to have gaps in either, eg. segments that do not render. Both 
audio and video tracks can have metadata that affect their rendering: an audio 
track has a volume metadata that attenuates its contribution to the overall 
mix-down, and a video track has matrix that controls its rendering. The only 
thing preventing us from styling a video track with CSS is the lack of 
definition.


 I don't really see what they have in common other than us using the word 
 track to refer to both of them, and that's mostly just an artefact of 
 the language.
 
  Track is more than an artifact of the language, it is the commonly used 
term in the digital media industry for an independent stream of media samples 
in a container file.

eric



Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-11 Thread Jeroen Wijering

On Apr 8, 2011, at 8:54 AM, Ian Hickson wrote:

 *) Discoverability is indeed an issue, but this can be fixed by defining 
 a common track API for signalling and enabling/disabling tracks:
 
 {{{
 interface Track {
  readonly attribute DOMString kind;
  readonly attribute DOMString label;
  readonly attribute DOMString language;
 
  const unsigned short OFF = 0;
  const unsigned short HIDDEN = 1;
  const unsigned short SHOWING = 2;
  attribute unsigned short mode;
 };
 
 interface HTMLMediaElement : HTMLElement {
  [...]
  readonly attribute Track[] tracks;
 };
 }}}
 
 There's a big difference between text tracks, audio tracks, and video 
 tracks. While it makes sense, for instance, to have text tracks enabled 
 but not showing, it makes no sense to do that with audio tracks. 

Audio and video tracks require more data, hence it's less preferred to allow 
them being  enabled but not showing. If data wasn't an issue, it would be great 
if this were possible; it'd allow instant switching between multiple audio 
dubs, or camera angles. 

In terms of the data model, I don't believe there's major differences between 
audio, text or video tracks. They all exist at the same level - one down from 
the main presentation layer. Toggling versus layering can be an option for all 
three kinds of tracks.

For example, multiple video tracks can be mixed together in one media element's 
display. Think about PiP, perspective side by side  (Stevenote style) or a 3D 
grid (group chat, like Skype). Perhaps this should be supported instead of 
relying upon multiple video elements, manual positioning and APIs to knit 
things together. One would loose in terms of flexibility, but gain in terms of 
API complexity (it's still one video) and ease of implementation for HTML 
developers.

- Jeroen






Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-11 Thread Jeroen Wijering
On Apr 8, 2011, at 8:54 AM, Ian Hickson wrote:

 but should be linked to the main media resource through markup.
 
 What is a main media resource?
 
 e.g. consider youtubedoubler.com; what is the main resource?
 
 Or similarly, when watching the director's commentary track on a movie, is 
 the commentary the main track, or the movie?

In systems like MPEG TS and DASH, there's the notion of the system clock. 
This is the overarching resource to which all audio, meta, text and video 
tracks are synced. The clock has no video frames or audio samples by itself, it 
just acts as the wardrobe for all tracks. Perhaps it's worth investigating if 
this would be useful for media elements? 

- Jeroen

Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-11 Thread Eric Carlson

On Apr 10, 2011, at 12:36 PM, Mark Watson wrote:

 In the case of in-band tracks it may still be the case that they are 
 retrieved independently over the network. This could happen two ways:
 - some file formats contain headers which enable precise navigation of the 
 file, for example using HTTP byte ranges, so that the tracks could be 
 retrieved independently. mp4 files would be an example. I don't know that 
 anyone does this, though.

  QuickTime has supported tracks with external media samples in .mov files for 
more than 15 years. This type of file is most commonly used during editing, but 
they are occasionally found on the net.


 - in the case of adaptive streaming based on a manifest, the different tracks 
 may be in different files, even though they appear as in-band tracks from an 
 HTML perspective.
 
 In these cases it *might* make sense to expose separate buffer and network 
 states for the different in-band tracks in just the same way as out-of-band 
 tracks.

  I strongly disagree. Having different tracks APIs for different container 
formats will be extremely confusing for developers, and I don't think it will 
add anything. A UA that chooses to support non-self contained media files 
should account for all samples when reporting readyState and networkState.

eric



Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-11 Thread Mark Watson


Sent from my iPhone

On Apr 11, 2011, at 8:55 AM, Eric Carlson eric.carl...@apple.com wrote:

 
 On Apr 10, 2011, at 12:36 PM, Mark Watson wrote:
 
 In the case of in-band tracks it may still be the case that they are 
 retrieved independently over the network. This could happen two ways:
 - some file formats contain headers which enable precise navigation of the 
 file, for example using HTTP byte ranges, so that the tracks could be 
 retrieved independently. mp4 files would be an example. I don't know that 
 anyone does this, though.
 
  QuickTime has supported tracks with external media samples in .mov files for 
 more than 15 years. This type of file is most commonly used during editing, 
 but they are occasionally found on the net.
 

I was also thinking of a client which downloads the MOOV box and then uses the 
tables there to construct byte range requests for specific tracks.

 
 - in the case of adaptive streaming based on a manifest, the different 
 tracks may be in different files, even though they appear as in-band tracks 
 from an HTML perspective.
 
 In these cases it *might* make sense to expose separate buffer and network 
 states for the different in-band tracks in just the same way as out-of-band 
 tracks.
 
  I strongly disagree. Having different tracks APIs for different container 
 formats will be extremely confusing for developers, and I don't think it will 
 add anything. A UA that chooses to support non-self contained media files 
 should account for all samples when reporting readyState and networkState.
 

Fair enough. I did say 'might' :-)

 eric
 
 


Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-11 Thread Ian Hickson
On Fri, 8 Apr 2011, Jer Noble wrote:
 On Apr 7, 2011, at 11:54 PM, Ian Hickson wrote:
  
  The distinction between a master media element and a master media 
  controller is, in my mind, mostly a distinction without a difference.  
  However, a welcome addition to the media controller would be 
  convenience APIs for the above properties (as well as playbackState, 
  networkState, seekable, and buffered).
  
  I'm not sure what networkState in this context. playbackState, 
  assuming you mean 'paused', is already exposed.
 
 Sorry, by playbackState, I meant readyState.  And I was suggesting that, 
 much in the same way that you've provided .buffered and .seekable 
 properties which expose the intersection of the slaved media elements' 
 corresponding ranges, that a readyState property could similarly 
 reflect the readyState values of all the slaved media elements. In this 
 case, the MediaController's hypothetical readyState wouldn't flip to 
 HAVE_ENOUGH_DATA until all the constituent media element's ready states 
 reached at least the same value.

So basically it would return the lowest possible value amongst the slaved 
elements? I guess we could expose such a convenience accessor, but what's 
the use case? It seems easy enough to implement manually in JS, so unless 
there's a compelling case, I'd be reluctant to add it.


 Of course, this would imply that the load events fired by a media 
 element (e.g. loadedmetadata, canplaythrough) were also fired by the 
 MediaController, and I would support this change as well.

I don't see why it would imply that, but certainly we could add events 
like that to the controller. Again though, what's the use case?


 Again, this would be just a convenience for authors, as this information 
 is already available in other forms and could be relatively easily 
 calculated on-the-fly in scripts.  But UAs are likely going to have do 
 these calculations anyway to support things like autoplay, so adding 
 explicit support for them in API form would not (imho) be unduly 
 burdensome.

Autoplay is handled without having to do these calculations, as far as I 
can tell. I don't see any reason why the UA would need to do these 
calculations actually. If there are compelling use cases, though, I'm 
happy to add such accessors.


On Fri, 8 Apr 2011, Eric Winkelman wrote:
 On Friday, April 08, 2011, Ian Hickson wrote:
  On Thu, 17 Feb 2011, Eric Winkelman wrote:
  
   MPEG transport streams, as used for commercial TV, will often 
   contain multiple types of metadata: content advisories, ad insertion 
   opportunities, interactive TV application triggers, etc.  If we were 
   getting this information out-of-band we would, as you suggest, know 
   how to deal with it.  We would use multiple @kind=metadata tracks, 
   with the correct handler associated with each track.  In our case, 
   however, this information is all coming in-band.
  
   There is information within the MPEG transport stream that 
   identifies the types of metadata being carried.  This lets the video 
   player know, for example, that the stream has a particular track 
   with application triggers, and another one with content advisories.  
   To be consistent with the out-of-band tracks, we envision the player 
   creating separate TimedTrack elements for each type of metadata, and 
   adding the associated data as cues.  But there isn't a clear way for 
   the player to indicate the type of metadata it's putting into each 
   of these TimedTrack cues.
  
   Which brings us to the mime types.  I have an event handler on the 
   video tag that fires when the player creates a new metadata track, 
   and this handler tries to figure out what to do with the track.  
   Without a type on the track, I have to set another handler on the 
   track that fires when the player creates a cue, and tries to figure 
   out what to do from the cue.  As there is no type on the cue either, 
   I have to examine the cue location/text to see if it contains 
   metadata I'm able to handle.
  
   This all works, but it requires event handlers on tracks that may 
   have no interest to the application.  On the player side, it depends 
   on the player tagging the metadata in a consistent ad-hoc way, as 
   well as requiring the player to create separate metadata tracks.  
   (We also considered starting the cue's text with a mime type, but 
   this has the same basic issues.)
  
  This is an interesting problem.
  
  What is the way that the MPEG streams identify these various metadata 
  streams? Is it a MIME type? Some other identifier? Is this identifier 
  separate from the track's label, or is it the track's label?
 
 The streams contain a Program Map Table (PMT) which contains a list of 
 tuples (program id (PID) and a standard numeric type) for the 
 program's tracks. This is how the user agent knows about this metadata 
 and what is contained in it. We're envisioning that the combination of 
 transport, e.g. MPEG-2 TS, and 

Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-11 Thread Jer Noble

On Apr 11, 2011, at 5:26 PM, Ian Hickson wrote:

 On Fri, 8 Apr 2011, Jer Noble wrote:
 
 Sorry, by playbackState, I meant readyState.  And I was suggesting that, 
 much in the same way that you've provided .buffered and .seekable 
 properties which expose the intersection of the slaved media elements' 
 corresponding ranges, that a readyState property could similarly 
 reflect the readyState values of all the slaved media elements. In this 
 case, the MediaController's hypothetical readyState wouldn't flip to 
 HAVE_ENOUGH_DATA until all the constituent media element's ready states 
 reached at least the same value.
 
 So basically it would return the lowest possible value amongst the slaved 
 elements? I guess we could expose such a convenience accessor, but what's 
 the use case? It seems easy enough to implement manually in JS, so unless 
 there's a compelling case, I'd be reluctant to add it.

Yes, this would be just a convenience, as I tried to make clear below.  So I 
don't want to seem like I'm pushing this too hard.  But since you asked...

 Of course, this would imply that the load events fired by a media 
 element (e.g. loadedmetadata, canplaythrough) were also fired by the 
 MediaController, and I would support this change as well.
 
 I don't see why it would imply that, but certainly we could add events 
 like that to the controller. Again though, what's the use case?

The use case for the events is the same one as for the convenience property: 
without a convenience event, authors would have to add event listeners to every 
slave media element.   So by imply, I simply meant that if the use case for 
the first was compelling enough to warrant new API, the second would be 
warranted as well.

Lets say, for example, an author wants to change the color of a play button 
when the media in a media group all reaches the HAVE_ENOUGH_DATA readyState.

Current API:
function init() {
var mediaGroupElements = 
document.querySelectorAll(*[mediaGroup=group1]);
for (var i = 0; i  mediaGroupElements.length; ++i)

mediaGroupElements.item(i).addEventListener('canplaythrough', 
readyStateChangeListener, false);
}

function readyStateChangeListener(e) {
var mediaGroupElements = 
document.querySelectorAll(*[mediaGroup=group1]);
var ready = mediaGroupElements.length  0;
for (var i = 0; i  mediaGroupElements.length; ++i)
if (mediaGroupElements.item(i).readyState  
HAVE_ENOUGH_DATA)
ready = false;
if (ready)
changePlayButtonColor();
}

Convenience API:
function init() {
var controller = 
document.querySelector(*[mediaGroup=group1]).controller;
controller.addEventListener('canplaythrough'), 
changePlayButtonColor, true;
}

I think the convenience benefits are pretty obvious.  Maybe not compelling 
enough, however. :)

 Again, this would be just a convenience for authors, as this information 
 is already available in other forms and could be relatively easily 
 calculated on-the-fly in scripts.  But UAs are likely going to have do 
 these calculations anyway to support things like autoplay, so adding 
 explicit support for them in API form would not (imho) be unduly 
 burdensome.
 
 Autoplay is handled without having to do these calculations, as far as I 
 can tell. I don't see any reason why the UA would need to do these 
 calculations actually. If there are compelling use cases, though, I'm 
 happy to add such accessors.


Well, how exactly is autoplay handled in a media group?  Does the entire media 
group start playing when the first media element in a group with it's autoplay 
attribute set reaches HAVE_ENOUGH_DATA?

-Jer

 Jer Noble jer.no...@apple.com



Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-10 Thread Silvia Pfeiffer
On Fri, Apr 8, 2011 at 4:54 PM, Ian Hickson i...@hixie.ch wrote:

 On Thu, 10 Feb 2011, Silvia Pfeiffer wrote:

 One particular issue that hasn't had much discussion here yet is the
 issue of how to deal with multitrack media resources or media resources
 that have associated synchronized audio and video resources. I'm
 concretely referring to such things as audio descriptions, sign language
 video, and dubbed audio tracks.

 We require an API that can expose such extra tracks to the user and to
 JavaScript. This should be independent of whether the tracks are
 actually inside the media resource or are given as separate resources,

 I think there's a big difference between multiple tracks inside one
 resource and multiple tracks spread amongst multiple resources: in the
 former case, one would need a single set of network state APIs (load
 algorithm, ready state, network state, dimensions, buffering state, etc),
 whereas in the second case we'd need N set of these APIs, one for each
 media resource.

 Given that the current mechanism for exposing the load state of a media
 resource is a media element (video, audio), I think it makes sense to
 reuse these elements for loading each media resource even in a multitrack
 scenario. Thus I do not necessarily agree that exposing extra tracks
 should be done in a way that as independent of whether the tracks are
 in-band or out-of-band.


 but should be linked to the main media resource through markup.

 What is a main media resource?

 e.g. consider youtubedoubler.com; what is the main resource?

 Or similarly, when watching the director's commentary track on a movie, is
 the commentary the main track, or the movie?


 I am bringing this up now because solutions may have an influence on the
 inner workings of TimedTrack and the track element, so before we have
 any implementations of track, we should be very certain that we are
 happy with the way in which it works - in particular that track
 continues to stay an empty element.

 I don't really see why this would be related to text tracks. Those have
 their own status framework, and interact directly with a media element.
 Looking again at the youtubedoubler.com example, one could envisage both
 sides having text tracks. They wouldn't be joint tracks.


I don't think youtubedoubler.com is the main use case here. In the
youtubedoubler.com use case, you have two independent videos that make
sense by themselves, but are only coupled together by their timeline.
The cases that I listed above, audio descriptions, sign language
video, and dubbed audio tracks, make no sense by themselves. They are
produced with a clear reference to one specific video and its details
and could be delivered either as in-band tracks or as external files.
From a developer and user point of view - and in analogy to the track
element - it makes no sense to regard them as independent media
resources. They all refer to a main resource - the original video.


 On Mon, 14 Feb 2011, Jeroen Wijering wrote:

 In terms of solutions, I lean much towards the manifest approach. The
 other approaches are options that each add more elements to HTML5,
 which:

 * Won't work for situations outside of HTML5.

 * Postpone, and perhaps clash with, the addition of manifests.

 Manifests, and indeed any solution that relies on a single media element,
 would make it very difficult to render multiple video tracks independently
 (e.g. side by side vs picture-in-picture). That's not to say that
 manifests shouldn't work, but I think we'd need another solution as well.


 *) The CSS styling issue can be fixed by making a conceptual change to
 CSS and text tracks. Instead of styling text tracks, a single text
 rendering area for each video element can be exposed and styled. Any
 text tracks that are enabled push data in it, which is automatically
 styled according to the video.textStyle/etc rules.

 This wouldn't work well with positioned captions.


 *) Discoverability is indeed an issue, but this can be fixed by defining
 a common track API for signalling and enabling/disabling tracks:

 {{{
 interface Track {
   readonly attribute DOMString kind;
   readonly attribute DOMString label;
   readonly attribute DOMString language;

   const unsigned short OFF = 0;
   const unsigned short HIDDEN = 1;
   const unsigned short SHOWING = 2;
   attribute unsigned short mode;
 };

 interface HTMLMediaElement : HTMLElement {
   [...]
   readonly attribute Track[] tracks;
 };
 }}}

 There's a big difference between text tracks, audio tracks, and video
 tracks. While it makes sense, for instance, to have text tracks enabled
 but not showing, it makes no sense to do that with audio tracks.
 Similarly, video tracks need their own display area, but text tracks need
 a video track's display area. A single video area can display one video
 (multiple overlapping videos being achieved by multiple playback areas),
 but multiple audio and text tracks can be mixed together without any
 

Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-10 Thread Mark Watson

On Apr 7, 2011, at 11:54 PM, Ian Hickson wrote:

 
 On Thu, 10 Feb 2011, Silvia Pfeiffer wrote:
 
 One particular issue that hasn't had much discussion here yet is the
 issue of how to deal with multitrack media resources or media resources
 that have associated synchronized audio and video resources. I'm
 concretely referring to such things as audio descriptions, sign language
 video, and dubbed audio tracks.
 
 We require an API that can expose such extra tracks to the user and to
 JavaScript. This should be independent of whether the tracks are
 actually inside the media resource or are given as separate resources,
 
 I think there's a big difference between multiple tracks inside one
 resource and multiple tracks spread amongst multiple resources: in the
 former case, one would need a single set of network state APIs (load
 algorithm, ready state, network state, dimensions, buffering state, etc),
 whereas in the second case we'd need N set of these APIs, one for each
 media resource.
 
 Given that the current mechanism for exposing the load state of a media
 resource is a media element (video, audio), I think it makes sense to
 reuse these elements for loading each media resource even in a multitrack
 scenario. Thus I do not necessarily agree that exposing extra tracks
 should be done in a way that as independent of whether the tracks are
 in-band or out-of-band.
 

In the case of in-band tracks it may still be the case that they are retrieved 
independently over the network. This could happen two ways:
- some file formats contain headers which enable precise navigation of the 
file, for example using HTTP byte ranges, so that the tracks could be retrieved 
independently. mp4 files would be an example. I don't know that anyone does 
this, though.
- in the case of adaptive streaming based on a manifest, the different tracks 
may be in different files, even though they appear as in-band tracks from an 
HTML perspective.

In these cases it *might* make sense to expose separate buffer and network 
states for the different in-band tracks in just the same way as out-of-band 
tracks. In fact the distinction between in-band and out-of-band tracks is 
mainly how you discover them: out-of-band the author is assumed to know about 
by some means of their own, in-band can be discovered by loading the metadata 
part of a single initial resource.

...Mark



Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-08 Thread Ian Hickson

On Thu, 10 Feb 2011, Silvia Pfeiffer wrote:
 
 One particular issue that hasn't had much discussion here yet is the 
 issue of how to deal with multitrack media resources or media resources 
 that have associated synchronized audio and video resources. I'm 
 concretely referring to such things as audio descriptions, sign language 
 video, and dubbed audio tracks.
 
 We require an API that can expose such extra tracks to the user and to 
 JavaScript. This should be independent of whether the tracks are 
 actually inside the media resource or are given as separate resources, 

I think there's a big difference between multiple tracks inside one 
resource and multiple tracks spread amongst multiple resources: in the 
former case, one would need a single set of network state APIs (load 
algorithm, ready state, network state, dimensions, buffering state, etc), 
whereas in the second case we'd need N set of these APIs, one for each 
media resource.

Given that the current mechanism for exposing the load state of a media 
resource is a media element (video, audio), I think it makes sense to 
reuse these elements for loading each media resource even in a multitrack 
scenario. Thus I do not necessarily agree that exposing extra tracks 
should be done in a way that as independent of whether the tracks are 
in-band or out-of-band.


 but should be linked to the main media resource through markup.

What is a main media resource?

e.g. consider youtubedoubler.com; what is the main resource?

Or similarly, when watching the director's commentary track on a movie, is 
the commentary the main track, or the movie?


 I am bringing this up now because solutions may have an influence on the 
 inner workings of TimedTrack and the track element, so before we have 
 any implementations of track, we should be very certain that we are 
 happy with the way in which it works - in particular that track 
 continues to stay an empty element.

I don't really see why this would be related to text tracks. Those have 
their own status framework, and interact directly with a media element. 
Looking again at the youtubedoubler.com example, one could envisage both 
sides having text tracks. They wouldn't be joint tracks.


On Mon, 14 Feb 2011, Jeroen Wijering wrote:
 
 In terms of solutions, I lean much towards the manifest approach. The 
 other approaches are options that each add more elements to HTML5, 
 which:
 
 * Won't work for situations outside of HTML5.

 * Postpone, and perhaps clash with, the addition of manifests.

Manifests, and indeed any solution that relies on a single media element, 
would make it very difficult to render multiple video tracks independently 
(e.g. side by side vs picture-in-picture). That's not to say that 
manifests shouldn't work, but I think we'd need another solution as well.


 *) The CSS styling issue can be fixed by making a conceptual change to 
 CSS and text tracks. Instead of styling text tracks, a single text 
 rendering area for each video element can be exposed and styled. Any 
 text tracks that are enabled push data in it, which is automatically 
 styled according to the video.textStyle/etc rules.

This wouldn't work well with positioned captions.


 *) Discoverability is indeed an issue, but this can be fixed by defining 
 a common track API for signalling and enabling/disabling tracks:

 {{{
 interface Track {
   readonly attribute DOMString kind;
   readonly attribute DOMString label;
   readonly attribute DOMString language;
 
   const unsigned short OFF = 0;
   const unsigned short HIDDEN = 1;
   const unsigned short SHOWING = 2;
   attribute unsigned short mode;
 };
 
 interface HTMLMediaElement : HTMLElement {
   [...]
   readonly attribute Track[] tracks;
 };
 }}}

There's a big difference between text tracks, audio tracks, and video 
tracks. While it makes sense, for instance, to have text tracks enabled 
but not showing, it makes no sense to do that with audio tracks. 
Similarly, video tracks need their own display area, but text tracks need 
a video track's display area. A single video area can display one video 
(multiple overlapping videos being achieved by multiple playback areas), 
but multiple audio and text tracks can be mixed together without any 
difficulty (mixing in one audio channel, or positioning over one video 
display area, respectively).

So I'm not sure a single tracks API makes sense.


On Wed, 16 Feb 2011, Eric Winkelman wrote:
 
 We're working with multitrack MPEG transport streams, and have an 
 implementation of the TimedTrack interface integrating with in-band 
 metadata tracks.  Our prototype uses the Metadata Cues to synchronize a 
 JavaScript application with a video stream using the stream's embedded 
 EISS signaling.  This approach is working very well so far.
 
 The biggest issue we've faced is that there isn't an obvious way to tell 
 the browser application what type of information is contained within the 
 metadata track/cues.  The Cues can contain 

Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-08 Thread Jer Noble

On Apr 7, 2011, at 11:54 PM, Ian Hickson wrote:

 The distinction between a master media element and a master media 
 controller is, in my mind, mostly a distinction without a difference.  
 However, a welcome addition to the media controller would be convenience 
 APIs for the above properties (as well as playbackState, networkState, 
 seekable, and buffered).
 
 I'm not sure what networkState in this context. playbackState, assuming 
 you mean 'paused', is already exposed.


Sorry, by playbackState, I meant readyState.   And I was suggesting that, much 
in the same way that you've provided .buffered and .seekable properties which 
expose the intersection of the slaved media elements' corresponding ranges, 
that a readyState property could similarly reflect the readyState values of all 
the slaved media elements.  In this case, the MediaController's hypothetical 
readyState wouldn't flip to HAVE_ENOUGH_DATA until all the constituent media 
element's ready states reached at least the same value.

Of course, this would imply that the load events fired by a media element (e.g. 
loadedmetadata, canplaythrough) were also fired by the MediaController, and I 
would support this change as well.

Again, this would be just a convenience for authors, as this information is 
already available in other forms and could be relatively easily calculated 
on-the-fly in scripts.  But UAs are likely going to have do these calculations 
anyway to support things like autoplay, so adding explicit support for them in 
API form would not (imho) be unduly burdensome.

-Jer

 Jer Noble jer.no...@apple.com



Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-08 Thread Eric Winkelman

On Friday, April 08, 2011, Ian Hickson wrote:
 On Thu, 17 Feb 2011, Eric Winkelman wrote:
 
  MPEG transport streams, as used for commercial TV, will often contain
  multiple types of metadata: content advisories, ad insertion
  opportunities, interactive TV application triggers, etc.  If we were
  getting this information out-of-band we would, as you suggest, know how
  to deal with it.  We would use multiple @kind=metadata tracks, with the
  correct handler associated with each track.  In our case, however, this
  information is all coming in-band.
 
  There is information within the MPEG transport stream that identifies
  the types of metadata being carried.  This lets the video player know,
  for example, that the stream has a particular track with application
  triggers, and another one with content advisories.  To be consistent
  with the out-of-band tracks, we envision the player creating separate
  TimedTrack elements for each type of metadata, and adding the associated
  data as cues.  But there isn't a clear way for the player to indicate
  the type of metadata it's putting into each of these TimedTrack cues.
 
  Which brings us to the mime types.  I have an event handler on the
  video tag that fires when the player creates a new metadata track, and
  this handler tries to figure out what to do with the track.  Without a
  type on the track, I have to set another handler on the track that fires
  when the player creates a cue, and tries to figure out what to do from
  the cue.  As there is no type on the cue either, I have to examine the
  cue location/text to see if it contains metadata I'm able to handle.
 
  This all works, but it requires event handlers on tracks that may have
  no interest to the application.  On the player side, it depends on the
  player tagging the metadata in a consistent ad-hoc way, as well as
  requiring the player to create separate metadata tracks.  (We also
  considered starting the cue's text with a mime type, but this has the
  same basic issues.)
 
 This is an interesting problem.
 
 What is the way that the MPEG streams identify these various metadata
 streams? Is it a MIME type? Some other identifier? Is this identifier
 separate from the track's label, or is it the track's label?

The streams contain a Program Map Table (PMT) which contains a list of tuples 
(program id (PID) and a standard numeric type) for the program's tracks. This 
is how the user agent knows about this metadata and what is contained in it. 
We're envisioning that the combination of transport, e.g. MPEG-2 TS, and PMT 
type would be used by the UA to select a MIME type. We're proposing that this 
MIME type would be the track's label. We think it would be better if there 
were a type attribute for the track to use instead of the label, but using 
the label would work.

Thanks,
Eric
---
e.winkel...@cablelabs.com


Re: [whatwg] How to handle multitrack media resources in HTML

2011-02-17 Thread Eric Winkelman
Silvia,

MPEG transport streams, as used for commercial TV, will often contain multiple 
types of metadata: content advisories, ad insertion opportunities, interactive 
TV application triggers, etc.  If we were getting this information out-of-band 
we would, as you suggest, know how to deal with it.  We would use multiple 
@kind=metadata tracks, with the correct handler associated with each track.  In 
our case, however, this information is all coming in-band. 

There is information within the MPEG transport stream that identifies the types 
of metadata being carried.  This lets the video player know, for example, that 
the stream has a particular track with application triggers, and another one 
with content advisories.  To be consistent with the out-of-band tracks, we 
envision the player creating separate TimedTrack elements for each type of 
metadata, and adding the associated data as cues.  But there isn't a clear way 
for the player to indicate the type of metadata it's putting into each of these 
TimedTrack cues.

Which brings us to the mime types.  I have an event handler on the video tag 
that fires when the player creates a new metadata track, and this handler tries 
to figure out what to do with the track.  Without a type on the track, I have 
to set another handler on the track that fires when the player creates a cue, 
and tries to figure out what to do from the cue.  As there is no type on the 
cue either, I have to examine the cue location/text to see if it contains 
metadata I'm able to handle.

This all works, but it requires event handlers on tracks that may have no 
interest to the application.  On the player side, it depends on the player 
tagging the metadata in a consistent ad-hoc way, as well as requiring the 
player to create separate metadata tracks.   (We also considered starting the 
cue's text with a mime type, but this has the same basic issues.)

Clear as mud, right?

Thanks,

Eric Winkelman
---
CableLabs

 -Original Message-
 From: Silvia Pfeiffer [mailto:silviapfeiff...@gmail.com]
 Sent: Wednesday, February 16, 2011 1:34 PM
 To: Eric Winkelman
 Cc: WHAT Working Group
 Subject: Re: [whatwg] How to handle multitrack media resources in HTML
 
 Hi Eric,
 
 I'm curious: if you are using @kind=metadata - which is not generically
 applicable, but only has application-specific data in it - then this implies 
 that
 the web page knows what type of data is in the track's cues and knows how
 to parse it. Why do you need a mime type on the cues then? Is it because
 MPEG has metadata cue tracks that can contain different types of structured
 content? Can you clarify?
 
 Cheers,
 Silvia.
 
 On Thu, Feb 17, 2011 at 6:44 AM, Eric Winkelman
 e.winkel...@cablelabs.com wrote:
  Silvia, all,
 
  We're working with multitrack MPEG transport streams, and have an
 implementation of the TimedTrack interface integrating with in-band
 metadata tracks.  Our prototype uses the Metadata Cues to synchronize a
 JavaScript application with a video stream using the stream's embedded EISS
 signaling.  This approach is working very well so far.
 
  The biggest issue we've faced is that there isn't an obvious way to tell the
 browser application what type of information is contained within the
 metadata track/cues.  The Cues can contain arbitrary text, but neither the
 Cue, nor the associated TimedTrack, has functionality for specifying the
 format/meaning of that text.
 
  Our current implementation uses the Cue's @identifier for a MIME type,
 and puts the associated metadata into the Cue's text field using XML.  This
 works, but requires the JavaScript browser application to examine the cues
 to see if they contain information it understands.  It also requires the video
 player to follow this convention for Metadata TimedTracks.
 
  Adding a @type attribute to the Cues would certainly help, though it would
 still require the browser application to examine individual cues to see if 
 they
 were useful.
 
  An alternate approach would be to add a @type attribute to the track
 tag/TimedTrack that would specify the mime type for the associated
 cues.  This would allow a browser application to determine from the
 TimedTrack whether  or not it needed to process the associated cues.
 
  Eric
  ---
  Eric Winkelman
  CableLabs
 
  -Original Message-
  From: whatwg-boun...@lists.whatwg.org [mailto:whatwg-
  boun...@lists.whatwg.org] On Behalf Of Silvia Pfeiffer
  Sent: Wednesday, February 09, 2011 5:41 PM
  To: WHAT Working Group
  Subject: [whatwg] How to handle multitrack media resources in HTML
 
  Hi all,
 
  One particular issue that hasn't had much discussion here yet is the
  issue of how to deal with multitrack media resources or media
  resources that have associated synchronized audio and video resources.
  I'm concretely referring to such things as audio descriptions, sign
  language video, and dubbed audio tracks.
 
  We require an API that can expose such extra tracks to the user

Re: [whatwg] How to handle multitrack media resources in HTML

2011-02-17 Thread Silvia Pfeiffer
Hi Eric,

That is an interesting use case. I had not considered that there were
any metadata tracks inside media resources that could be exposed, too.

The first thing that we would need is the commitment of browser
vendors to actually parse those metadata tracks and expose them to the
Web page through the TimedTrack mechanism. Since I don't know about
the types of tracks you are referring to, let me ask you some dumb
questions.

Are the metadata tracks that you are referring to encoded in the same
way into MP4 files as caption tracks? If so, is there a mime type on
that track? Or how do you identify what the different tracks contain?
Is there other software that pulls out all of these tracks and does it
in a generic way (i.e. not application-specific)?

I am asking because the @type attribute is just one way to let
JavaScript know about the content of the track. We also have the
@label attribute, which may actually be more appropriate in this case,
since it's data that is not meant for the browser to display, but to
hand on to JavaScript. I'm trying to find a way in which this will
work with the framework that we have created.

Another thing that we haven't talked about yet is how to handle
header-type meta data for WebVTT files, which is a similar problem to
what you are proposing. It might be an idea to add a field for meta
information to the API, but I am not sure yet if that is a good idea.

Cheers,
Silvia.



On Fri, Feb 18, 2011 at 3:55 AM, Eric Winkelman
e.winkel...@cablelabs.com wrote:
 Silvia,

 MPEG transport streams, as used for commercial TV, will often contain 
 multiple types of metadata: content advisories, ad insertion opportunities, 
 interactive TV application triggers, etc.  If we were getting this 
 information out-of-band we would, as you suggest, know how to deal with it.  
 We would use multiple @kind=metadata tracks, with the correct handler 
 associated with each track.  In our case, however, this information is all 
 coming in-band.

 There is information within the MPEG transport stream that identifies the 
 types of metadata being carried.  This lets the video player know, for 
 example, that the stream has a particular track with application triggers, 
 and another one with content advisories.  To be consistent with the 
 out-of-band tracks, we envision the player creating separate TimedTrack 
 elements for each type of metadata, and adding the associated data as cues.  
 But there isn't a clear way for the player to indicate the type of metadata 
 it's putting into each of these TimedTrack cues.

 Which brings us to the mime types.  I have an event handler on the video 
 tag that fires when the player creates a new metadata track, and this handler 
 tries to figure out what to do with the track.  Without a type on the track, 
 I have to set another handler on the track that fires when the player creates 
 a cue, and tries to figure out what to do from the cue.  As there is no type 
 on the cue either, I have to examine the cue location/text to see if it 
 contains metadata I'm able to handle.

 This all works, but it requires event handlers on tracks that may have no 
 interest to the application.  On the player side, it depends on the player 
 tagging the metadata in a consistent ad-hoc way, as well as requiring the 
 player to create separate metadata tracks.   (We also considered starting the 
 cue's text with a mime type, but this has the same basic issues.)

 Clear as mud, right?

 Thanks,

 Eric Winkelman
 ---
 CableLabs

 -Original Message-
 From: Silvia Pfeiffer [mailto:silviapfeiff...@gmail.com]
 Sent: Wednesday, February 16, 2011 1:34 PM
 To: Eric Winkelman
 Cc: WHAT Working Group
 Subject: Re: [whatwg] How to handle multitrack media resources in HTML

 Hi Eric,

 I'm curious: if you are using @kind=metadata - which is not generically
 applicable, but only has application-specific data in it - then this implies 
 that
 the web page knows what type of data is in the track's cues and knows how
 to parse it. Why do you need a mime type on the cues then? Is it because
 MPEG has metadata cue tracks that can contain different types of structured
 content? Can you clarify?

 Cheers,
 Silvia.

 On Thu, Feb 17, 2011 at 6:44 AM, Eric Winkelman
 e.winkel...@cablelabs.com wrote:
  Silvia, all,
 
  We're working with multitrack MPEG transport streams, and have an
 implementation of the TimedTrack interface integrating with in-band
 metadata tracks.  Our prototype uses the Metadata Cues to synchronize a
 JavaScript application with a video stream using the stream's embedded EISS
 signaling.  This approach is working very well so far.
 
  The biggest issue we've faced is that there isn't an obvious way to tell 
  the
 browser application what type of information is contained within the
 metadata track/cues.  The Cues can contain arbitrary text, but neither the
 Cue, nor the associated TimedTrack, has functionality for specifying the
 format/meaning of that text.
 
  Our

Re: [whatwg] How to handle multitrack media resources in HTML

2011-02-16 Thread Eric Winkelman
Silvia, all,

We're working with multitrack MPEG transport streams, and have an 
implementation of the TimedTrack interface integrating with in-band metadata 
tracks.  Our prototype uses the Metadata Cues to synchronize a JavaScript 
application with a video stream using the stream's embedded EISS signaling.  
This approach is working very well so far.

The biggest issue we've faced is that there isn't an obvious way to tell the 
browser application what type of information is contained within the metadata 
track/cues.  The Cues can contain arbitrary text, but neither the Cue, nor the 
associated TimedTrack, has functionality for specifying the format/meaning of 
that text.

Our current implementation uses the Cue's @identifier for a MIME type, and puts 
the associated metadata into the Cue's text field using XML.  This works, but 
requires the JavaScript browser application to examine the cues to see if they 
contain information it understands.  It also requires the video player to 
follow this convention for Metadata TimedTracks.

Adding a @type attribute to the Cues would certainly help, though it would 
still require the browser application to examine individual cues to see if they 
were useful.

An alternate approach would be to add a @type attribute to the track 
tag/TimedTrack that would specify the mime type for the associated cues.  This 
would allow a browser application to determine from the TimedTrack whether  or 
not it needed to process the associated cues.

Eric
---
Eric Winkelman
CableLabs

 -Original Message-
 From: whatwg-boun...@lists.whatwg.org [mailto:whatwg-
 boun...@lists.whatwg.org] On Behalf Of Silvia Pfeiffer
 Sent: Wednesday, February 09, 2011 5:41 PM
 To: WHAT Working Group
 Subject: [whatwg] How to handle multitrack media resources in HTML
 
 Hi all,
 
 One particular issue that hasn't had much discussion here yet is the issue of
 how to deal with multitrack media resources or media resources that have
 associated synchronized audio and video resources.
 I'm concretely referring to such things as audio descriptions, sign language
 video, and dubbed audio tracks.
 
 We require an API that can expose such extra tracks to the user and to
 JavaScript. This should be independent of whether the tracks are actually
 inside the media resource or are given as separate resources, but should be
 linked to the main media resource through markup.
 
 I am bringing this up now because solutions may have an influence on the
 inner workings of TimedTrack and the track element, so before we have
 any implementations of track, we should be very certain that we are
 happy with the way in which it works - in particular that track continues to
 stay an empty element.
 
 We've had some preliminary discussions about this in the W3C Accessibility
 Task Force and the alternatives that we could think about are captured in
 http://www.w3.org/WAI/PF/HTML/wiki/Media_Multitrack_Media_API . This
 may not be the complete list of possible solutions, but it provides ideas for
 the different approaches that can be taken.
 
 I'd like to see what people's opinions are about them.
 
 Note there are also discussion threads about this at the W3C both in the
 Accessibility TF [1] and the HTML Working Group [2], but I am curious about
 input from the wider community.
 
 So check out
 http://www.w3.org/WAI/PF/HTML/wiki/Media_Multitrack_Media_API
 and share your opinions.
 
 Cheers,
 Silvia.
 
 [1] http://lists.w3.org/Archives/Public/public-html-a11y/2011Feb/0057.html
 [2] http://lists.w3.org/Archives/Public/public-html/2011Feb/0205.html


Re: [whatwg] How to handle multitrack media resources in HTML

2011-02-16 Thread Silvia Pfeiffer
Hi Eric,

I'm curious: if you are using @kind=metadata - which is not
generically applicable, but only has application-specific data in it -
then this implies that the web page knows what type of data is in the
track's cues and knows how to parse it. Why do you need a mime type on
the cues then? Is it because MPEG has metadata cue tracks that can
contain different types of structured content? Can you clarify?

Cheers,
Silvia.

On Thu, Feb 17, 2011 at 6:44 AM, Eric Winkelman
e.winkel...@cablelabs.com wrote:
 Silvia, all,

 We're working with multitrack MPEG transport streams, and have an 
 implementation of the TimedTrack interface integrating with in-band metadata 
 tracks.  Our prototype uses the Metadata Cues to synchronize a JavaScript 
 application with a video stream using the stream's embedded EISS signaling.  
 This approach is working very well so far.

 The biggest issue we've faced is that there isn't an obvious way to tell the 
 browser application what type of information is contained within the metadata 
 track/cues.  The Cues can contain arbitrary text, but neither the Cue, nor 
 the associated TimedTrack, has functionality for specifying the 
 format/meaning of that text.

 Our current implementation uses the Cue's @identifier for a MIME type, and 
 puts the associated metadata into the Cue's text field using XML.  This 
 works, but requires the JavaScript browser application to examine the cues to 
 see if they contain information it understands.  It also requires the video 
 player to follow this convention for Metadata TimedTracks.

 Adding a @type attribute to the Cues would certainly help, though it would 
 still require the browser application to examine individual cues to see if 
 they were useful.

 An alternate approach would be to add a @type attribute to the track 
 tag/TimedTrack that would specify the mime type for the associated cues.  
 This would allow a browser application to determine from the TimedTrack 
 whether  or not it needed to process the associated cues.

 Eric
 ---
 Eric Winkelman
 CableLabs

 -Original Message-
 From: whatwg-boun...@lists.whatwg.org [mailto:whatwg-
 boun...@lists.whatwg.org] On Behalf Of Silvia Pfeiffer
 Sent: Wednesday, February 09, 2011 5:41 PM
 To: WHAT Working Group
 Subject: [whatwg] How to handle multitrack media resources in HTML

 Hi all,

 One particular issue that hasn't had much discussion here yet is the issue of
 how to deal with multitrack media resources or media resources that have
 associated synchronized audio and video resources.
 I'm concretely referring to such things as audio descriptions, sign language
 video, and dubbed audio tracks.

 We require an API that can expose such extra tracks to the user and to
 JavaScript. This should be independent of whether the tracks are actually
 inside the media resource or are given as separate resources, but should be
 linked to the main media resource through markup.

 I am bringing this up now because solutions may have an influence on the
 inner workings of TimedTrack and the track element, so before we have
 any implementations of track, we should be very certain that we are
 happy with the way in which it works - in particular that track continues 
 to
 stay an empty element.

 We've had some preliminary discussions about this in the W3C Accessibility
 Task Force and the alternatives that we could think about are captured in
 http://www.w3.org/WAI/PF/HTML/wiki/Media_Multitrack_Media_API . This
 may not be the complete list of possible solutions, but it provides ideas for
 the different approaches that can be taken.

 I'd like to see what people's opinions are about them.

 Note there are also discussion threads about this at the W3C both in the
 Accessibility TF [1] and the HTML Working Group [2], but I am curious about
 input from the wider community.

 So check out
 http://www.w3.org/WAI/PF/HTML/wiki/Media_Multitrack_Media_API
 and share your opinions.

 Cheers,
 Silvia.

 [1] http://lists.w3.org/Archives/Public/public-html-a11y/2011Feb/0057.html
 [2] http://lists.w3.org/Archives/Public/public-html/2011Feb/0205.html



Re: [whatwg] How to handle multitrack media resources in HTML

2011-02-14 Thread Jeroen Wijering
Hello Silvia, all,

First, thanks for the Multitrack wiki page. Very helpful for those who are not 
subscribed to the various lists. I also phrased below comments as feedback to 
this page:

http://www.w3.org/WAI/PF/HTML/wiki/Media_Multitrack_Media_API

USE CASE 

The use case is spot on; this is an issue that blocks HTML5 video from being 
chosen over a solution like Flash. An elaborate list of tracks is important, to 
correctly scope the conditions / resolutions:

1. Tracks targeting device capabilities:
   * Different containers / codes / profiles
   * Multiview (3D) or surround sound
   * Playback rights and/or decryption possibilities
2. Tracks targeting content customization:
   * Alternate viewing angles or alternate music scores
   * Director's comments or storyboard video
3. Tracks targeting accessibility:
   * Dubbed audio or text subtitles
   * Audio descriptions or closed captions
   * Tracks cleared from cursing / nudity / violence
4. Tracks targeting the interface:
   * Chapterlists, bookmarks, timed annotations, midroll hints..
   * .. and any other type of scripting queues

Note I included the HTML5 text tracks. I believe there are four kinds of 
tracks, all inherent part of a media presentation. These types designate the 
output of the track, not its encoded representation:

* audio (producing sound)
* metadata (producing scripting queues)
* text (producing rendered text)
* video (producing images)

In this taxonomy, the HTML5 subtitles and captions track kinds are text, 
the descriptions kind is audio and the chapters and metadata kinds are 
metadata.

REQUIREMENTS

The requirements are elaborate, but do note they span beyond HTML5. Everything 
that plays back audio/video needs multitrack support:

* Broad- and narrowcasting playback devices of any kind
* Native desktop, mobile and settop applications/apps
* Devices that play media standalone (mediaplayers, pictureframes, airplay)

Also, on e.g. the iPhone and Android devices, playback of video is triggered by 
HTML5, but subsequently detached from it. Think about the custom fullscreen 
controls, the obscuring of all HTML and events/cueues that are deliberately 
ignored or not sent (such as play() in iOS). I wonder whether this is a 
temporary state or something that will remain and  should be provisioned. 

With this in mind, I think an additional requirement is that there should be a 
full solution outside the scope of HTML5. HTML5 has unique capabilities like 
customization of the layout (CSS) and interaction (JavaScript), but it must not 
be required.

SIDE CONDITIONS

In the side conditions, I'm not sure on the relative volume of audio or 
positioning of video. Automation by default might work better and requires no 
parameters. For audio, blending can be done through a ducking mechanism (like 
the JW Player does). For video, blending can be done through an alpha channel. 
At a later stage, an API/heuristics for PIP support and gain control can be 
added.

SOLUTIONS

In terms of solutions, I lean much towards the manifest approach. The other 
approaches are options that each add more elements to HTML5, which:

* Won't work for situations outside of HTML5.
* Postpone, and perhaps clash with, the addition of manifests.

Without a manifest, there'll probably be no adaptive streaming, which renders 
HTML5 video much less useful. At the same time, standardization around 
manifests (DASH) is largely wrapping up.

EXAMPLE

Here's some code on the manifest approach. First the HTML5 side:

video id=v1 poster=video.png controls
  source src=manifest.xml type=video/mpeg-dash
/video

Second the manifest side:

MPD mediaPresentationDuration=PT645S type=OnDemand
BaseURLhttp://cdn.example.com/myVideo//BaseURL
Period

Group mimeType=video/webm  lang=en
Representation sourceURL=video-1600.webm /
/Group

Group mimeType=video/mp4; codecs=avc1.42E00C,mp4a.40.2 lang=en
Representation sourceURL=video-1600.mp4 /
/Group

Group mimeType=text/vvt lang=en
Accessibility type=CC /
Representation sourceURL=captions.vtt /
/Group

/Period
/MPD


(I should more look into accessibility parameters, but there is support for 
signalling captions, audiodescriptions, sign language etc.)

Note that this approach moves the text track outside of HTML5, making it 
accessible for other clients as well. Both codecs are also in the manifest - 
this is just one of the device capability selectors of DASH clients.

DISADVANTAGES

The two listed disadvantages for the manifest approach in the wiki page are 
lack of CSS and discoverability:

*) The CSS styling issue can be fixed by making a conceptual change to CSS and 
text tracks. Instead of styling text tracks, a single text rendering area for 
each video element can be exposed and styled. Any text tracks that are enabled 
push data in it, which is automatically styled according to the 
video.textStyle/etc rules.

*)