Re: [whatwg] cue points in media elements
On Wed, 24 Oct 2007, Dave Singer wrote: > > Caution: cross-posted to whatwg and htmlwg; be careful with > follow-ups! Actually, please don't cross-post new threads to both groups. As mentioned earlier this week, I only cross-post when the messages I'm replying to were sent to both groups as a convenience to both groups so they can see what progress is being made on issues that were discussed there; as a general rule it's better to not cross-post. Thanks! > We've been looking into both semantic and implementation considerations > of cue points. We wonder whether cue ranges might not make more sense. Done. I also changed the way that cue points (er, ranges) are removed, which I think will make it easier to handle swapping in sets of subtitles or the like. Comments welcome. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Cue points in media elements
On Sun, 29 Apr 2007, Brian Campbell wrote: > > The problem is that the callbacks execute "when the current playback > position of a media element reaches" the cue point. It seems unclear to > me what "reaching" a particular time means. If video playback freezes > for a second, and so misses a cue point, is that considered to have been > "reached"? Is there any way that you can guarantee that a cue point will > be executed as long as video has passed a particular cue point? With a > lot of bookkeeping and the "timeupdate" event along with the cue points, > you may be able to keep track of the current time in the movie well > enough to deal with the user skipping forward, pausing, and the video > stalling and restarting due to running out of buffer. This doesn't > address, as far as I can tell, issues like the thread displaying the > video pausing for whatever reason and so skipping forward after it > resumes, which may cause cue points to be lost, and which isn't > specified to send a "timeupdate" event. I've defined what "reaching" a particular time means. I have explicitly made it invoke the times that might get skipped due to missing frames during normal playback. I have also made it _not_ fire the callbacks for times in between the old and new positions when seeking. > Basically, what is necessary is a way to specify that a cue point should > always be fired as long as playback has passed a certain time, not just > if it "reaches" a particular time. This would prevent us from having to > do a lot of bookkeeping to make sure that cue points haven't been > missed, and make everything simpler and less fragile. You can use the "timeupdate" event for this -- it fires whenever a cue point is hit, and whenever the timeline is seeked (even implicitly by the looping algorithm). > For now, we are focusing on captioning for the deaf. We have voiceovers > on some screens with no associated video, video that appears in various > places on the screen, and the occasional sound effects. Because there is > not a consistent video location, nor is there even a frame for > voiceovers to appear in, we don't display the captions directly over the > video, but instead send events to the current screen, which is > responsible for catching the events and displaying them in a location > appropriate for that screen, usually a standard location. In the current > spec, all that is provided for is controls to turn closed captions on or > off. What would be much better is a way to enable the video element to > send caption events, which include the text of the current caption, and > can be used to display those captions in a way that fits the design of > the content better. I've added this to the list for version 2 features. I'm interested in seeing what the requirements are for captions before we go ahead and spec them in too much detail. Implementation feedback will be helpful here. Thanks for your feedback! On Mon, 30 Apr 2007, Ralph Giles wrote: > > I'd be more in favor of triggering any cue point callbacks that lie > between the current playback position and the current playback position > of the next frame (audio frame for and video frame for > I guess). That means more bookkeeping to implement your system, but is > less surprising in other cases. Could you elaborate on this? Right now the system triggers cue points up to the current displayed frame, and some cue points between the current frame and the next frame, if the gap between the frames is long enough that the time updates more often than the framerate. > As I read it, cue points are relative to the current playback position, > which does not advance if the stream buffer underruns, but it would if > playback restarts after a gap, as might happen if the connection drops, > or in an RTP stream. My proposal above would need to be amended to > handle that case, and the decoder dropping frames...finding the right > language here is hard. Does the new text work for this? > A more abstract interface is necessary than just 'caption events'. Here > are some use cases worth considering: > > * A media file has embedded textual metadata like title, author, > copyright license, that the designer would like to access for associated > display elsewhere in the page, or to alter the displayed user interface > based on the metadata. This is pretty essential for parity with > flash-based internet radio players. > > * The designer wants to access closed captioned or subtitle text > through the DOM as it becomes available for display elsewhere in the > page. > > * There are points in the media file where the embedded metadata > changes. These points cannot be retrieved without scanning the file, > which is expensive over the network, and may not be possible in general > if the stream is a live feed. Nevertheless, the designer wants to be > notified when the associated metadata changes so other elements can be
Re: [whatwg] Cue points in media elements
On May 2, 2007, at 11:01 AM, Dave Singer wrote: At 17:04 -0400 1/05/07, Brian Campbell wrote: On May 1, 2007, at 1:05 PM, Kevin Calhoun wrote: I believe that a cue point is "reached" if its time is traversed during playback. What does "traversed" mean in terms of (a) seeking across the cue point (b) playing in reverse (rewinding) and (c) the media stalling an restarting at a later point in the stream? I would say that playing (at any rate and in any direction) is a continuous function, and therefore cue points are triggered, when playing, whenever two samples of the time straddle the cue point (where straddel includes one of the samples being at the cue point). Seeking is discontinuous, and therefore cue points are triggered only if a seek results in landing on the cue point, if not playing. If playing, then the usual rules apply. A discontinuous jump will result in a timeupdate notification, which among other things is supposed to enable scripts to issue notifications of interesting times that are traversed not during playback but while seeking.
Re: [whatwg] Cue points in media elements
At 17:04 -0400 1/05/07, Brian Campbell wrote: On May 1, 2007, at 1:05 PM, Kevin Calhoun wrote: I believe that a cue point is "reached" if its time is traversed during playback. What does "traversed" mean in terms of (a) seeking across the cue point (b) playing in reverse (rewinding) and (c) the media stalling an restarting at a later point in the stream? I would say that playing (at any rate and in any direction) is a continuous function, and therefore cue points are triggered, when playing, whenever two samples of the time straddle the cue point (where straddel includes one of the samples being at the cue point). Seeking is discontinuous, and therefore cue points are triggered only if a seek results in landing on the cue point, if not playing. If playing, then the usual rules apply. Frame dropping, stalling, and so on, are aspects of the playback behavior and nothing to do with the logical model of cues laid on a time axis. -- David Singer Apple Computer/QuickTime
Re: [whatwg] Cue points in media elements
Hearing about cue points in media elements. Just sorta reminds me of keyTimes in SMIL. I know SMIL seems funky to some people, but I do really love it! It is so way cool! So far as I know it doesn't do quite what you're talking about here, but it does similar stuff including non-linear distortions of timing elements and the like. It's declarative (though I don't think it's Turing complete -- wager of virtual beans proposed) and its syntax is worthy of emulation in that classical "ontology recapitulates philology" sort of sense. It is so much a W3C standard that it has six or eight or twelve standards devoted to it. David Dailey (who is trying to learn how not to re-invent wheels) http://srufaculty.sru.edu/david.dailey/copyright/dailey_on_copyright.htm Damn bastard mutant wheels keep popping up around me like unwanted copyrighted utterances in a world where intellectual landfills are charged by the bit! -- anonymous - Original Message - From: "Brian Campbell" <[EMAIL PROTECTED]> To: "Ralph Giles" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Tuesday, May 01, 2007 4:57 PM Subject: Re: [whatwg] Cue points in media elements On Apr 30, 2007, at 7:15 PM, Ralph Giles wrote: Thanks for adding to the discussion. We're very interested in implementing support for presentations as well, so it's good to hear from someone with experience. Thanks for responding, I'm glad to hear your input. On Sun, Apr 29, 2007 at 03:14:27AM -0400, Brian Campbell wrote: in our language, you might see something like this: (movie "Foo.mov" :name 'movie) (wait @movie (tc 2 3)) (show @bullet-1) (wait @movie) (show @bullet-2) If the user skips to the end of the media clip, that simply causes all WAITs on that media clip to return instantly. If they skip forward in the media clip, without ending it, all WAITs before that point will return instantly. How does this work if, for example, the user seeks forward, and then back to an earlier position? Would some of the 'show's be undone, or do they not seek backward with the media playback? We don't expose arbitrary seeking controls to our users; just play/ pause, skip forward & back one card (which resets all state to a known value) and skip past the current video/audio (which just causes all waits on that media element to return instantly). Is the essential component of your system that all the shows be called in sequence to build up a display state, or that the last state trigger before the current playback point have been triggered? The former. Isn't this slow if a bunch of intermediate animations are triggered by a seek? Yes, though this is more a bug in our animation API (which could be taught to skip directly to the end of an animation when associated video/audio ends, but that just hasn't been done yet). Actually, that brings up another point, which is a bit more speculative. It may be nice to have a way to register a callback that will be called at animation rates (at least 15 frames/second or so) that is called with the current play time of a media element. This would allow you to keep animations in sync with video, even if the video might stall briefly, or seek forward or backward for whatever reason. We haven't implemented this in our current system (as I said, it still has the bug that animations still take their full time to play even when you skip video), but it may be helpful for this sort of thing. Does your system support live streaming as well? That complicates the design some when the presentation media updates appear dynamically. No, we only support progressive download. Anyway I think you could implement your system with the currently proposed interface by checking the current playback position and clearing a separate list of waits inside your timeupdate callback. I agree, it would be possible, but from my current reading of the spec it sounds like some cue points might be missed until quite a bit later (since timeupdate isn't guaranteed to be called every time anything discontinuous happens with the media). In general, having to do extra bookkeeping to keep track of the state of the media may be fragile, so stronger guarantees about when cue points are fired is better than trying to keep track of what's going on with timeupdate events. I agree this should be clarified. The appropriate interpretation should be when the current playback position reaches the frame corresponding to the queue point, but digital media has quantized frames, while the cue points are floating point numbers. Triggering all cue point callbacks between the last current playback position and the current one (including during seeks) would be one option, and do what you want as long as you aren't seeking backward. I'd be more in fav
Re: [whatwg] Cue points in media elements
On May 1, 2007, at 1:05 PM, Kevin Calhoun wrote: I believe that a cue point is "reached" if its time is traversed during playback. What does "traversed" mean in terms of (a) seeking across the cue point (b) playing in reverse (rewinding) and (c) the media stalling an restarting at a later point in the stream?
Re: [whatwg] Cue points in media elements
On Apr 30, 2007, at 7:15 PM, Ralph Giles wrote: Thanks for adding to the discussion. We're very interested in implementing support for presentations as well, so it's good to hear from someone with experience. Thanks for responding, I'm glad to hear your input. On Sun, Apr 29, 2007 at 03:14:27AM -0400, Brian Campbell wrote: in our language, you might see something like this: (movie "Foo.mov" :name 'movie) (wait @movie (tc 2 3)) (show @bullet-1) (wait @movie) (show @bullet-2) If the user skips to the end of the media clip, that simply causes all WAITs on that media clip to return instantly. If they skip forward in the media clip, without ending it, all WAITs before that point will return instantly. How does this work if, for example, the user seeks forward, and then back to an earlier position? Would some of the 'show's be undone, or do they not seek backward with the media playback? We don't expose arbitrary seeking controls to our users; just play/ pause, skip forward & back one card (which resets all state to a known value) and skip past the current video/audio (which just causes all waits on that media element to return instantly). Is the essential component of your system that all the shows be called in sequence to build up a display state, or that the last state trigger before the current playback point have been triggered? The former. Isn't this slow if a bunch of intermediate animations are triggered by a seek? Yes, though this is more a bug in our animation API (which could be taught to skip directly to the end of an animation when associated video/audio ends, but that just hasn't been done yet). Actually, that brings up another point, which is a bit more speculative. It may be nice to have a way to register a callback that will be called at animation rates (at least 15 frames/second or so) that is called with the current play time of a media element. This would allow you to keep animations in sync with video, even if the video might stall briefly, or seek forward or backward for whatever reason. We haven't implemented this in our current system (as I said, it still has the bug that animations still take their full time to play even when you skip video), but it may be helpful for this sort of thing. Does your system support live streaming as well? That complicates the design some when the presentation media updates appear dynamically. No, we only support progressive download. Anyway I think you could implement your system with the currently proposed interface by checking the current playback position and clearing a separate list of waits inside your timeupdate callback. I agree, it would be possible, but from my current reading of the spec it sounds like some cue points might be missed until quite a bit later (since timeupdate isn't guaranteed to be called every time anything discontinuous happens with the media). In general, having to do extra bookkeeping to keep track of the state of the media may be fragile, so stronger guarantees about when cue points are fired is better than trying to keep track of what's going on with timeupdate events. I agree this should be clarified. The appropriate interpretation should be when the current playback position reaches the frame corresponding to the queue point, but digital media has quantized frames, while the cue points are floating point numbers. Triggering all cue point callbacks between the last current playback position and the current one (including during seeks) would be one option, and do what you want as long as you aren't seeking backward. I'd be more in favor of triggering any cue point callbacks that lie between the current playback position and the current playback position of the next frame (audio frame for and video frame for I guess). That means more bookkeeping to implement your system, but is less surprising in other cases. Sure, that would probably work. As I said, bookkeeping is generally a problem because it might get out of sync, but with stronger guarantees about when cue points are triggered, I think it could work. If video playback freezes for a second, and so misses a cue point, is that considered to have been "reached"? As I read it, cue points are relative to the current playback position, which does not advance if the stream buffer underruns, but it would if playback restarts after a gap, as might happen if the connection drops, or in an RTP stream. My proposal above would need to be amended to handle that case, and the decoder dropping frames...finding the right language here is hard. Yes, it's a tricky little problem. Our current system stays out of trouble because it makes quite a few simplifying assumptions (video is played forward only, progressive download, not streaming, etc). Obviously, in order to support a more general API, you're going to
Re: [whatwg] Cue points in media elements
On Apr 30, 2007, at 4:15 PM, Ralph Giles wrote: [On Apr 29, 2007, at 12:14 AM, Brian Campbell wrote:[ If video playback freezes for a second, and so misses a cue point, is that considered to have been "reached"? As I read it, cue points are relative to the current playback position, which does not advance if the stream buffer underruns, but it would if playback restarts after a gap, as might happen if the connection drops, or in an RTP stream. My proposal above would need to be amended to handle that case, and the decoder dropping frames...finding the right language here is hard. I believe that a cue point is "reached" if its time is traversed during playback. - Kevin
Re: [whatwg] Cue points in media elements
On 4/29/07, Brian Campbell <[EMAIL PROTECTED]> wrote: For the sort of content that we produce, cue points are incredibly important. Most of our content consists of a video or voiceover playing while bullet points appear, animations play, and graphics are revealed, all in sync with the video. We have a very simple system for doing cue points, that is extremely easy for the content authors to write and is robust for paused media, media that is skipped to the end, etc. We simply have a blocking call, WAIT, that waits until a specific point or the end of a specified media element. For instance, in our language, you might see something like this: (movie "Foo.mov" :name 'movie) (wait @movie (tc 2 3)) (show @bullet-1) (wait @movie) (show @bullet-2) If the user skips to the end of the media clip, that simply causes all WAITs on that media clip to return instantly. If they skip forward in the media clip, without ending it, all WAITs before that point will return instantly. If the user pauses the media clip, all WAITs on the media clip will block until it is playing again. This is a nice system, but I can't see how even as simple a system as this could be implemented given the current specification of cue points. The problem is that the callbacks execute "when the current playback position of a media element reaches" the cue point. It seems unclear to me what "reaching" a particular time means. If video playback freezes for a second, and so misses a cue point, is that considered to have been "reached"? Is there any way that you can guarantee that a cue point will be executed as long as video has passed a particular cue point? With a lot of bookkeeping and the "timeupdate" event along with the cue points, you may be able to keep track of the current time in the movie well enough to deal with the user skipping forward, pausing, and the video stalling and restarting due to running out of buffer. This doesn't address, as far as I can tell, issues like the thread displaying the video pausing for whatever reason and so skipping forward after it resumes, which may cause cue points to be lost, and which isn't specified to send a "timeupdate" event. Basically, what is necessary is a way to specify that a cue point should always be fired as long as playback has passed a certain time, not just if it "reaches" a particular time. This would prevent us from having to do a lot of bookkeeping to make sure that cue points haven't been missed, and make everything simpler and less fragile. In order to capture this kind of situations, with flexibility in mind, I think the concept of "cue points" may be changed to "cue periods"... Method names: addEnterCuePeriod(time1, time2, callback) removeEnterCuePeriod(time1, time2, callback) addLeaveCuePeriod(time1, time2, callback) removeLeaveCuePeriod(time1, time2, callback) The callback function mentioned by addEnterCuePeriod will be invoked once when the video enter the period of time bounded by time1 and time2. How the video get to a frame between time1 and time2 doesn't matter. i.e. the callback function may be invoked by a normally playing video reaching time1, a video being fast forward / wind back into the period between time1 & time2, or a particular timing between time1 & time2 of the video being directly seek for. The mechanism of LeaveCuePeriod is similar, while this time the callback is invoked when the video leave the specified cue period. (Or should this pair of methods left out?) With these four methods, one can not only achieve the "bullet point" effect, but also video captions appearance and disappearance. Hope this helps. 郁
Re: [whatwg] Cue points in media elements
Thanks for adding to the discussion. We're very interested in implementing support for presentations as well, so it's good to hear from someone with experience. Since we work on streaming media formats, I always assumed things would have to be broken up by the server and the various components streamed separately to a browser, and I hadn't noticed the cue point support until you pointed it out. Some comments and questions below... On Sun, Apr 29, 2007 at 03:14:27AM -0400, Brian Campbell wrote: > in our language, you might see something like this: > > (movie "Foo.mov" :name 'movie) > (wait @movie (tc 2 3)) > (show @bullet-1) > (wait @movie) > (show @bullet-2) > > If the user skips to the end of the media clip, that simply causes > all WAITs on that media clip to return instantly. If they skip > forward in the media clip, without ending it, all WAITs before that > point will return instantly. How does this work if, for example, the user seeks forward, and then back to an earlier position? Would some of the 'show's be undone, or do they not seek backward with the media playback? Is the essential component of your system that all the shows be called in sequence to build up a display state, or that the last state trigger before the current playback point have been triggered? Isn't this slow if a bunch of intermediate animations are triggered by a seek? Does your system support live streaming as well? That complicates the design some when the presentation media updates appear dynamically. Anyway I think you could implement your system with the currently proposed interface by checking the current playback position and clearing a separate list of waits inside your timeupdate callback. > This is a nice system, but I can't see how even as simple a system as > this could be implemented given the current specification of cue > points. The problem is that the callbacks execute "when the current > playback position of a media element reaches" the cue point. It seems > unclear to me what "reaching" a particular time means. I agree this should be clarified. The appropriate interpretation should be when the current playback position reaches the frame corresponding to the queue point, but digital media has quantized frames, while the cue points are floating point numbers. Triggering all cue point callbacks between the last current playback position and the current one (including during seeks) would be one option, and do what you want as long as you aren't seeking backward. I'd be more in favor of triggering any cue point callbacks that lie between the current playback position and the current playback position of the next frame (audio frame for and video frame for I guess). That means more bookkeeping to implement your system, but is less surprising in other cases. > If video > playback freezes for a second, and so misses a cue point, is that > considered to have been "reached"? As I read it, cue points are relative to the current playback position, which does not advance if the stream buffer underruns, but it would if playback restarts after a gap, as might happen if the connection drops, or in an RTP stream. My proposal above would need to be amended to handle that case, and the decoder dropping frames...finding the right language here is hard. > In the current spec, all that is > provided for is controls to turn closed captions on or off. What > would be much better is a way to enable the video element to send > caption events, which include the text of the current caption, and > can be used to display those captions in a way that fits the design > of the content better. I really like this idea. It would also be nice if, for example, the closed caption text were available through the DOM so it could be presented elsewhere, searched locally, and so on. But what about things like album art, which might be embedded in an audio stream? Should that be accessible? Should a video element expose a set of known cue points embedded in the file? A more abstract interface is necessary than just 'caption events'. Here are some use cases worth considering: * A media file has embedded textual metadata like title, author, copyright license, that the designer would like to access for associated display elsewhere in the page, or to alter the displayed user interface based on the metadata. This is pretty essential for parity with flash-based internet radio players. * A media file has embedded non-textual metadata like an album cover image, that the designer would like to access for display elsewhere in the page. * The designer wants to access closed captioned or subtitle text through the DOM as it becomes available for display elsewhere in the page. * There are points in the media file where the embedded metadata changes. These points cannot be retrieved without scanning the file, whi