Re: [whatwg] cue points in media elements

2007-10-26 Thread Ian Hickson
On Wed, 24 Oct 2007, Dave Singer wrote:
> Caution:  cross-posted to whatwg and htmlwg;  be careful with 
> follow-ups!

Actually, please don't cross-post new threads to both groups. As mentioned 
earlier this week, I only cross-post when the messages I'm replying to 
were sent to both groups as a convenience to both groups so they can see 
what progress is being made on issues that were discussed there; as a 
general rule it's better to not cross-post. Thanks!

> We've been looking into both semantic and implementation considerations 
> of cue points.  We wonder whether cue ranges might not make more sense.


I also changed the way that cue points (er, ranges) are removed, which I 
think will make it easier to handle swapping in sets of subtitles or the 
like. Comments welcome.

Ian Hickson   U+1047E)\._.,--,'``.fL   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Cue points in media elements

2007-10-18 Thread Ian Hickson
On Sun, 29 Apr 2007, Brian Campbell wrote:
> The problem is that the callbacks execute "when the current playback 
> position of a media element reaches" the cue point. It seems unclear to 
> me what "reaching" a particular time means. If video playback freezes 
> for a second, and so misses a cue point, is that considered to have been 
> "reached"? Is there any way that you can guarantee that a cue point will 
> be executed as long as video has passed a particular cue point? With a 
> lot of bookkeeping and the "timeupdate" event along with the cue points, 
> you may be able to keep track of the current time in the movie well 
> enough to deal with the user skipping forward, pausing, and the video 
> stalling and restarting due to running out of buffer. This doesn't 
> address, as far as I can tell, issues like the thread displaying the 
> video pausing for whatever reason and so skipping forward after it 
> resumes, which may cause cue points to be lost, and which isn't 
> specified to send a "timeupdate" event.

I've defined what "reaching" a particular time means. I have explicitly 
made it invoke the times that might get skipped due to missing frames 
during normal playback. I have also made it _not_ fire the callbacks for 
times in between the old and new positions when seeking.

> Basically, what is necessary is a way to specify that a cue point should 
> always be fired as long as playback has passed a certain time, not just 
> if it "reaches" a particular time. This would prevent us from having to 
> do a lot of bookkeeping to make sure that cue points haven't been 
> missed, and make everything simpler and less fragile.

You can use the "timeupdate" event for this -- it fires whenever a cue 
point is hit, and whenever the timeline is seeked (even implicitly by the 
looping algorithm).

> For now, we are focusing on captioning for the deaf. We have voiceovers 
> on some screens with no associated video, video that appears in various 
> places on the screen, and the occasional sound effects. Because there is 
> not a consistent video location, nor is there even a frame for 
> voiceovers to appear in, we don't display the captions directly over the 
> video, but instead send events to the current screen, which is 
> responsible for catching the events and displaying them in a location 
> appropriate for that screen, usually a standard location. In the current 
> spec, all that is provided for is controls to turn closed captions on or 
> off. What would be much better is a way to enable the video element to 
> send caption events, which include the text of the current caption, and 
> can be used to display those captions in a way that fits the design of 
> the content better.

I've added this to the list for version 2 features. I'm interested in 
seeing what the requirements are for captions before we go ahead and spec 
them in too much detail. Implementation feedback will be helpful here.

Thanks for your feedback!

On Mon, 30 Apr 2007, Ralph Giles wrote:
> I'd be more in favor of triggering any cue point callbacks that lie 
> between the current playback position and the current playback position 
> of the next frame (audio frame for  and video frame for  
> I guess). That means more bookkeeping to implement your system, but is 
> less surprising in other cases.

Could you elaborate on this? Right now the system triggers cue points up 
to the current displayed frame, and some cue points between the current 
frame and the next frame, if the gap between the frames is long enough 
that the time updates more often than the framerate.

> As I read it, cue points are relative to the current playback position, 
> which does not advance if the stream buffer underruns, but it would if 
> playback restarts after a gap, as might happen if the connection drops, 
> or in an RTP stream. My proposal above would need to be amended to 
> handle that case, and the decoder dropping frames...finding the right 
> language here is hard.

Does the new text work for this?

> A more abstract interface is necessary than just 'caption events'. Here 
> are some use cases worth considering:
> * A media file has embedded textual metadata like title, author, 
> copyright license, that the designer would like to access for associated 
> display elsewhere in the page, or to alter the displayed user interface 
> based on the metadata. This is pretty essential for parity with 
> flash-based internet radio players.
> * The designer wants to access closed captioned or subtitle text 
> through the DOM as it becomes available for display elsewhere in the 
> page.
> * There are points in the media file where the embedded metadata 
> changes. These points cannot be retrieved without scanning the file, 
> which is expensive over the network, and may not be possible in general 
> if the stream is a live feed. Nevertheless, the designer wants to be 
> notified when the associated metadata changes so other elements can be 

Re: [whatwg] Cue points in media elements

2007-05-02 Thread Kevin Calhoun

On May 2, 2007, at 11:01 AM, Dave Singer wrote:

At 17:04  -0400 1/05/07, Brian Campbell wrote:

On May 1, 2007, at 1:05 PM, Kevin Calhoun wrote:

I believe that a cue point is "reached" if its time is traversed  
during playback.

What does "traversed" mean in terms of (a) seeking across the cue  
point (b) playing in reverse (rewinding) and (c) the media stalling  
an restarting at a later point in the stream?

I would say that playing (at any rate and in any direction) is a  
continuous function, and therefore cue points are triggered, when  
playing, whenever two samples of the time straddle the cue point  
(where straddel includes one of the samples being at the cue point).

Seeking is discontinuous, and therefore cue points are triggered  
only if a seek results in landing on the cue point, if not playing.   
If playing, then the usual rules apply.

A discontinuous jump will result in a timeupdate notification, which  
among other things is supposed to enable scripts to issue  
notifications of interesting times that are traversed not during  
playback but while seeking.

Re: [whatwg] Cue points in media elements

2007-05-02 Thread Dave Singer

At 17:04  -0400 1/05/07, Brian Campbell wrote:

On May 1, 2007, at 1:05 PM, Kevin Calhoun wrote:

I believe that a cue point is "reached" if its time is traversed 
during playback.

What does "traversed" mean in terms of (a) seeking across the cue 
point (b) playing in reverse (rewinding) and (c) the media stalling 
an restarting at a later point in the stream?

I would say that playing (at any rate and in any direction) is a 
continuous function, and therefore cue points are triggered, when 
playing, whenever two samples of the time straddle the cue point 
(where straddel includes one of the samples being at the cue point).

Seeking is discontinuous, and therefore cue points are triggered only 
if a seek results in landing on the cue point, if not playing.  If 
playing, then the usual rules apply.

Frame dropping, stalling, and so on, are aspects of the playback 
behavior and nothing to do with the logical model of cues laid on a 
time axis.

David Singer
Apple Computer/QuickTime

Re: [whatwg] Cue points in media elements

2007-05-01 Thread ddailey
Hearing about cue points in media elements. Just sorta reminds me of 
keyTimes in SMIL.

I know SMIL seems funky to some people, but I do really love it! It is so 
way cool! So far as I know it doesn't do quite what you're talking about 
here, but it does similar stuff including non-linear distortions of timing 
elements and the like.

It's declarative (though I don't think it's Turing complete -- wager of 
virtual beans proposed) and its syntax is worthy of emulation in that 
classical "ontology recapitulates philology" sort of sense. It is so much a 
W3C standard that it has six or eight or twelve standards devoted to it.

David Dailey
(who is trying to learn how not to re-invent wheels)

Damn bastard mutant wheels keep popping up around me like unwanted 
copyrighted utterances in a world where intellectual landfills are charged 
by the bit!

-- anonymous

- Original Message - 
From: "Brian Campbell" <[EMAIL PROTECTED]>

To: "Ralph Giles" <[EMAIL PROTECTED]>
Sent: Tuesday, May 01, 2007 4:57 PM
Subject: Re: [whatwg] Cue points in media elements

On Apr 30, 2007, at 7:15 PM, Ralph Giles wrote:

Thanks for adding to the discussion. We're very interested in
implementing support for presentations as well, so it's good
to hear from someone with experience.

Thanks for responding, I'm glad to hear your input.

On Sun, Apr 29, 2007 at 03:14:27AM -0400, Brian Campbell wrote:

in our language, you might see something like this:

  (movie "" :name 'movie)
  (wait @movie (tc 2 3))
  (show @bullet-1)
  (wait @movie)
  (show @bullet-2)

If the user skips to the end of the media clip, that simply causes
all WAITs on that  media clip to return instantly. If they skip
forward in the media clip, without ending it, all WAITs before that
point will return instantly.

How does this work if, for example, the user seeks forward, and then
back to an earlier position? Would some of the 'show's be undone,  or do
they not seek backward with the media playback?

We don't expose arbitrary seeking controls to our users; just play/ pause, 
skip forward & back one card (which resets all state to a  known value) 
and skip past the current video/audio (which just causes  all waits on 
that media element to return instantly).

Is the essential
component of your system that all the shows be called in sequence
to build up a display state, or that the last state trigger before the
current playback point have been triggered?

The former.

Isn't this slow if a bunch
of intermediate animations are triggered by a seek?

Yes, though this is more a bug in our animation API (which could be 
taught to skip directly to the end of an animation when associated 
video/audio ends, but that just hasn't been done yet).

Actually, that brings up another point, which is a bit more  speculative. 
It may be nice to have a way to register a callback that  will be called 
at animation rates (at least 15 frames/second or so)  that is called with 
the current play time of a media element. This  would allow you to keep 
animations in sync with video, even if the  video might stall briefly, or 
seek forward or backward for whatever  reason. We haven't implemented this 
in our current system (as I said,  it still has the bug that animations 
still take their full time to  play even when you skip video), but it may 
be helpful for this sort  of thing.

Does your system support live streaming as well? That complicates the
design some when the presentation media updates appear dynamically.

No, we only support progressive download.

Anyway I think you could implement your system with the currently
proposed interface by checking the current playback position and
clearing a separate list of waits inside your timeupdate callback.

I agree, it would be possible, but from my current reading of the  spec it 
sounds like some cue points might be missed until quite a bit  later 
(since timeupdate isn't guaranteed to be called every time  anything 
discontinuous happens with the media). In general, having to  do extra 
bookkeeping to keep track of the state of the media may be  fragile, so 
stronger guarantees about when cue points are fired is  better than trying 
to keep track of what's going on with timeupdate  events.

I agree this should be clarified. The appropriate interpretation  should
be when the current playback position reaches the frame  corresponding to
the queue point, but digital media has quantized frames, while the cue
points are floating point numbers. Triggering all cue point callbacks
between the last current playback position and the current one
(including during seeks) would be one option, and do what you want as
long as you aren't seeking backward. I'd be more in fav

Re: [whatwg] Cue points in media elements

2007-05-01 Thread Brian Campbell

On May 1, 2007, at 1:05 PM, Kevin Calhoun wrote:

I believe that a cue point is "reached" if its time is traversed  
during playback.

What does "traversed" mean in terms of (a) seeking across the cue  
point (b) playing in reverse (rewinding) and (c) the media stalling  
an restarting at a later point in the stream?

Re: [whatwg] Cue points in media elements

2007-05-01 Thread Brian Campbell

On Apr 30, 2007, at 7:15 PM, Ralph Giles wrote:

Thanks for adding to the discussion. We're very interested in
implementing support for presentations as well, so it's good
to hear from someone with experience.

Thanks for responding, I'm glad to hear your input.

On Sun, Apr 29, 2007 at 03:14:27AM -0400, Brian Campbell wrote:

in our language, you might see something like this:

  (movie "" :name 'movie)
  (wait @movie (tc 2 3))
  (show @bullet-1)
  (wait @movie)
  (show @bullet-2)

If the user skips to the end of the media clip, that simply causes
all WAITs on that  media clip to return instantly. If they skip
forward in the media clip, without ending it, all WAITs before that
point will return instantly.

How does this work if, for example, the user seeks forward, and then
back to an earlier position? Would some of the 'show's be undone,  
or do

they not seek backward with the media playback?

We don't expose arbitrary seeking controls to our users; just play/ 
pause, skip forward & back one card (which resets all state to a  
known value) and skip past the current video/audio (which just causes  
all waits on that media element to return instantly).

Is the essential
component of your system that all the shows be called in sequence
to build up a display state, or that the last state trigger before the
current playback point have been triggered?

The former.

Isn't this slow if a bunch
of intermediate animations are triggered by a seek?

Yes, though this is more a bug in our animation API (which could be  
taught to skip directly to the end of an animation when associated  
video/audio ends, but that just hasn't been done yet).

Actually, that brings up another point, which is a bit more  
speculative. It may be nice to have a way to register a callback that  
will be called at animation rates (at least 15 frames/second or so)  
that is called with the current play time of a media element. This  
would allow you to keep animations in sync with video, even if the  
video might stall briefly, or seek forward or backward for whatever  
reason. We haven't implemented this in our current system (as I said,  
it still has the bug that animations still take their full time to  
play even when you skip video), but it may be helpful for this sort  
of thing.

Does your system support live streaming as well? That complicates the
design some when the presentation media updates appear dynamically.

No, we only support progressive download.

Anyway I think you could implement your system with the currently
proposed interface by checking the current playback position and
clearing a separate list of waits inside your timeupdate callback.

I agree, it would be possible, but from my current reading of the  
spec it sounds like some cue points might be missed until quite a bit  
later (since timeupdate isn't guaranteed to be called every time  
anything discontinuous happens with the media). In general, having to  
do extra bookkeeping to keep track of the state of the media may be  
fragile, so stronger guarantees about when cue points are fired is  
better than trying to keep track of what's going on with timeupdate  

I agree this should be clarified. The appropriate interpretation  
be when the current playback position reaches the frame  
corresponding to

the queue point, but digital media has quantized frames, while the cue
points are floating point numbers. Triggering all cue point callbacks
between the last current playback position and the current one
(including during seeks) would be one option, and do what you want as
long as you aren't seeking backward. I'd be more in favor of  

any cue point callbacks that lie between the current playback position
and the current playback position of the next frame (audio frame for
 and video frame for  I guess). That means more
bookkeeping to implement your system, but is less surprising in other

Sure, that would probably work. As I said, bookkeeping is generally a  
problem because it might get out of sync, but with stronger  
guarantees about when cue points are triggered, I think it could work.

  If video
playback freezes for a second, and so misses a cue point, is that
considered to have been "reached"?

As I read it, cue points are relative to the current playback  

which does not advance if the stream buffer underruns, but it would
if playback restarts after a gap, as might happen if the connection
drops, or in an RTP stream. My proposal above would need to be amended
to handle that case, and the decoder dropping frames...finding the  

language here is hard.

Yes, it's a tricky little problem. Our current system stays out of  
trouble because it makes quite a few simplifying assumptions (video  
is played forward only, progressive download, not streaming, etc).  
Obviously, in order to support a more general API, you're going to

Re: [whatwg] Cue points in media elements

2007-05-01 Thread Kevin Calhoun

On Apr 30, 2007, at 4:15 PM, Ralph Giles wrote:

[On Apr 29, 2007, at 12:14 AM, Brian Campbell wrote:[

 If video
playback freezes for a second, and so misses a cue point, is that
considered to have been "reached"?

As I read it, cue points are relative to the current playback  

which does not advance if the stream buffer underruns, but it would
if playback restarts after a gap, as might happen if the connection
drops, or in an RTP stream. My proposal above would need to be amended
to handle that case, and the decoder dropping frames...finding the  

language here is hard.

I believe that a cue point is "reached" if its time is traversed  
during playback.

- Kevin

Re: [whatwg] Cue points in media elements

2007-05-01 Thread Billy Wong

On 4/29/07, Brian Campbell <[EMAIL PROTECTED]> wrote:

For the sort of content that we produce, cue points are incredibly
important. Most of our content consists of a video or voiceover
playing while bullet points appear, animations play, and graphics are
revealed, all in sync with the video. We have a very simple system
for doing cue points, that is extremely easy for the content authors
to write and is robust for paused media, media that is skipped to the
end, etc. We simply have a blocking call, WAIT, that waits until a
specific point or the end of a specified media element. For instance,
in our language, you might see something like this:

   (movie "" :name 'movie)
   (wait @movie (tc 2 3))
   (show @bullet-1)
   (wait @movie)
   (show @bullet-2)

If the user skips to the end of the media clip, that simply causes
all WAITs on that  media clip to return instantly. If they skip
forward in the media clip, without ending it, all WAITs before that
point will return instantly. If the user pauses the media clip, all
WAITs on the media clip will block until it is playing again.

This is a nice system, but I can't see how even as simple a system as
this could be implemented given the current specification of cue
points. The problem is that the callbacks execute "when the current
playback position of a media element reaches" the cue point. It seems
unclear to me what "reaching" a particular time means. If video
playback freezes for a second, and so misses a cue point, is that
considered to have been "reached"? Is there any way that you can
guarantee that a cue point will be executed as long as video has
passed a particular cue point? With a lot of bookkeeping and the
"timeupdate" event along with the cue points, you may be able to keep
track of the current time in the movie well enough to deal with the
user skipping forward, pausing, and the video stalling and restarting
due to running out of buffer. This doesn't address, as far as I can
tell, issues like the thread displaying the video pausing for
whatever reason and so skipping forward after it resumes, which may
cause cue points to be lost, and which isn't specified to send a
"timeupdate" event.

Basically, what is necessary is a way to specify that a cue point
should always be fired as long as playback has passed a certain time,
not just if it "reaches" a particular time. This would prevent us
from having to do a lot of bookkeeping to make sure that cue points
haven't been missed, and make everything simpler and less fragile.

In order to capture this kind of situations, with flexibility in mind, I
think the concept of "cue points" may be changed to "cue periods"...

Method names:
addEnterCuePeriod(time1, time2, callback)
removeEnterCuePeriod(time1, time2, callback)
addLeaveCuePeriod(time1, time2, callback)
removeLeaveCuePeriod(time1, time2, callback)

The callback function mentioned by addEnterCuePeriod will be invoked once
when the video enter the period of time bounded by time1 and time2.  How the
video get to a frame between time1 and time2 doesn't matter.  i.e.  the
callback function may be invoked by a normally playing video reaching time1,
a video being fast forward / wind back into the period between time1 &
time2, or a particular timing between time1 & time2 of the video being
directly seek for.

The mechanism of LeaveCuePeriod is similar, while this time the callback is
invoked when the video leave the specified cue period.  (Or should this pair
of methods left out?)

With these four methods, one can not only achieve the "bullet point" effect,
but also video captions appearance and disappearance.

Hope this helps.


Re: [whatwg] Cue points in media elements

2007-04-30 Thread Ralph Giles
Thanks for adding to the discussion. We're very interested in 
implementing support for presentations as well, so it's good
to hear from someone with experience. 

Since we work on streaming media formats, I always assumed things would 
have to be broken up by the server and the various components streamed 
separately to a browser, and I hadn't noticed the cue point support 
until you pointed it out.

Some comments and questions below...

On Sun, Apr 29, 2007 at 03:14:27AM -0400, Brian Campbell wrote:

> in our language, you might see something like this:
>   (movie "" :name 'movie)
>   (wait @movie (tc 2 3))
>   (show @bullet-1)
>   (wait @movie)
>   (show @bullet-2)
> If the user skips to the end of the media clip, that simply causes  
> all WAITs on that  media clip to return instantly. If they skip  
> forward in the media clip, without ending it, all WAITs before that  
> point will return instantly.

How does this work if, for example, the user seeks forward, and then
back to an earlier position? Would some of the 'show's be undone, or do 
they not seek backward with the media playback? Is the essential 
component of your system that all the shows be called in sequence 
to build up a display state, or that the last state trigger before the 
current playback point have been triggered? Isn't this slow if a bunch 
of intermediate animations are triggered by a seek?

Does your system support live streaming as well? That complicates the 
design some when the presentation media updates appear dynamically.

Anyway I think you could implement your system with the currently 
proposed interface by checking the current playback position and 
clearing a separate list of waits inside your timeupdate callback.

> This is a nice system, but I can't see how even as simple a system as  
> this could be implemented given the current specification of cue  
> points. The problem is that the callbacks execute "when the current  
> playback position of a media element reaches" the cue point. It seems  
> unclear to me what "reaching" a particular time means.

I agree this should be clarified. The appropriate interpretation should 
be when the current playback position reaches the frame corresponding to 
the queue point, but digital media has quantized frames, while the cue 
points are floating point numbers. Triggering all cue point callbacks 
between the last current playback position and the current one 
(including during seeks) would be one option, and do what you want as 
long as you aren't seeking backward. I'd be more in favor of triggering
any cue point callbacks that lie between the current playback position 
and the current playback position of the next frame (audio frame for 
 and video frame for  I guess). That means more 
bookkeeping to implement your system, but is less surprising in other 

>   If video  
> playback freezes for a second, and so misses a cue point, is that  
> considered to have been "reached"?

As I read it, cue points are relative to the current playback position, 
which does not advance if the stream buffer underruns, but it would
if playback restarts after a gap, as might happen if the connection
drops, or in an RTP stream. My proposal above would need to be amended
to handle that case, and the decoder dropping frames...finding the right 
language here is hard.

> In the current spec, all that is  
> provided for is controls to turn closed captions on or off. What  
> would be much better is a way to enable the video element to send  
> caption events, which include the text of the current caption, and  
> can be used to display those captions in a way that fits the design  
> of the content better.

I really like this idea. It would also be nice if, for example, the 
closed caption text were available through the DOM so it could be
presented elsewhere, searched locally, and so on. But what about things 
like album art, which might be embedded in an audio stream? Should that 
be accessible? Should a video element expose a set of known cue points 
embedded in the file? 

A more abstract interface is necessary than just 'caption events'. Here 
are some use cases worth considering:

* A media file has embedded textual metadata like title, author, 
copyright license, that the designer would like to access for associated 
display elsewhere in the page, or to alter the displayed user interface
based on the metadata. This is pretty essential for parity with 
flash-based internet radio players.

* A media file has embedded non-textual metadata like an album cover 
image, that the designer would like to access for display elsewhere in
the page.

* The designer wants to access closed captioned or subtitle text 
through the DOM as it becomes available for display elsewhere in the 

* There are points in the media file where the embedded metadata 
changes. These points cannot be retrieved without scanning the file, 