Re: [whatwg] cue points in media elements

2007-10-26 Thread Ian Hickson
On Wed, 24 Oct 2007, Dave Singer wrote:
 
 Caution:  cross-posted to whatwg and htmlwg;  be careful with 
 follow-ups!

Actually, please don't cross-post new threads to both groups. As mentioned 
earlier this week, I only cross-post when the messages I'm replying to 
were sent to both groups as a convenience to both groups so they can see 
what progress is being made on issues that were discussed there; as a 
general rule it's better to not cross-post. Thanks!


 We've been looking into both semantic and implementation considerations 
 of cue points.  We wonder whether cue ranges might not make more sense.

Done.

I also changed the way that cue points (er, ranges) are removed, which I 
think will make it easier to handle swapping in sets of subtitles or the 
like. Comments welcome.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


[whatwg] cue points in media elements

2007-10-23 Thread Dave Singer

Caution:  cross-posted to whatwg and htmlwg;  be careful with follow-ups!

* * * * *


We've been looking into both semantic and implementation 
considerations of cue points.  We wonder whether cue ranges might not 
make more sense.


Cues might often be used to establish appropriate parallel state. 
For example, cues could be used to show 'chapter names', or to 
provide commentary in an HTML pane on the display.  Under these 
circumstances, the question arises as to what the right behavior is 
when seeking.  Should any of the cue-points preceding the seek point 
be activated (in order to establish the right context), and if so, 
how many?  Should any of the cue-points after the previous play point 
be activated to tear-down any state at that point?


There is also an implementation question.  What should happen if 
cue-points are more dense than the playback software can process in 
real-time?  In video, this would cause catch-up techniques (e.g. 
frame-dropping).  But dropping cue-points is problematic.  If it's 
permitted, any cue-points that depend on previous ones having also 
fired (when playing linearly) cannot assume that they have, in fact, 
fired.  They have to re-establish state without any regard for 
context, which may complicate them.  (Though it's true that to an 
extent they have to do this anyway, if seeking can happen).  Worse, 
if the event to set a parallel state (e.g. a parental warning on a 
blue passage) is executed, and the event to remove it is not, the 
resulting display may be misleading or semantically incorrect or 
inconsistent.


These questions seem to resolve much better with cue ranges.  For a 
cue range, events are executed on either both entry and exit, or 
neither, much in the way that mouse events are generated for cursor 
movement, giving either both mouseEnter and mouseExit or neither. 
Similarly, fast mouse movements might tunnel right across a region 
with neither an entry nor exit event.  Formally, the logical 
definition of a cue range event would be that the time is 
periodically sampled (as densely as possible).  At each sampling 
instant, a cue event is dispatched:
 * for every range for which the previous sampling instant was in 
that range, and the current sampling instant is not;
 * for every range for which the previous sampling instant was not in 
that range, and the current sampling instant is.


Note that
* this formal definition is amenable to optimization, by looking 
ahead to the 'next interesting time' when a cue rang starts or ends, 
when playing.
* for any range for which you get an entry, you are assured you will 
get an exit eventually.
* you are not guaranteed to get the events *at* their defined times; 
they might be 'late', though the system should be implemented in such 
a way as to minimize lateness.
* short ranges might experience no sampling instant within them, and 
might be skipped, posting no events, though this also should be 
avoided if possible by implementations.
* on a seek, you will get exit events for the time seeked from, if 
appropriate, as well as entry events for the time seeked to.


I would suggest that the cue-range interval includes its start time 
but excludes its end-time.  Therefore seeking to the exact start time 
of a cue-range, even before playback is started, fires its entry 
event (if we were previously outside the range), whereas seeking to 
the end-time of a range, even before playback started, fires its exit 
event (if we were previously inside it).


If reverse playback is started after such seeks, then you get 
immediately another event (exit or entry), but I think that's OK as 
reverse playback is unusual.  I guess the algorithm could be 
sensitive to the sign of the default playback rate, but that seems 
both excessively complicated, and also raises questions of what 
happens if the sign is changed while paused.


If a cue-range end time is the same as its start time, one merely 
gets two events (enter and exit) dispatched at the same time, or 
nothing at all (if it gets 'tunneled over').


Does this ease both the semantic and implementation considerations?

--
David Singer
Apple/QuickTime


Re: [whatwg] Cue points in media elements

2007-10-18 Thread Ian Hickson
On Sun, 29 Apr 2007, Brian Campbell wrote:
 
 The problem is that the callbacks execute when the current playback 
 position of a media element reaches the cue point. It seems unclear to 
 me what reaching a particular time means. If video playback freezes 
 for a second, and so misses a cue point, is that considered to have been 
 reached? Is there any way that you can guarantee that a cue point will 
 be executed as long as video has passed a particular cue point? With a 
 lot of bookkeeping and the timeupdate event along with the cue points, 
 you may be able to keep track of the current time in the movie well 
 enough to deal with the user skipping forward, pausing, and the video 
 stalling and restarting due to running out of buffer. This doesn't 
 address, as far as I can tell, issues like the thread displaying the 
 video pausing for whatever reason and so skipping forward after it 
 resumes, which may cause cue points to be lost, and which isn't 
 specified to send a timeupdate event.

I've defined what reaching a particular time means. I have explicitly 
made it invoke the times that might get skipped due to missing frames 
during normal playback. I have also made it _not_ fire the callbacks for 
times in between the old and new positions when seeking.


 Basically, what is necessary is a way to specify that a cue point should 
 always be fired as long as playback has passed a certain time, not just 
 if it reaches a particular time. This would prevent us from having to 
 do a lot of bookkeeping to make sure that cue points haven't been 
 missed, and make everything simpler and less fragile.

You can use the timeupdate event for this -- it fires whenever a cue 
point is hit, and whenever the timeline is seeked (even implicitly by the 
looping algorithm).


 For now, we are focusing on captioning for the deaf. We have voiceovers 
 on some screens with no associated video, video that appears in various 
 places on the screen, and the occasional sound effects. Because there is 
 not a consistent video location, nor is there even a frame for 
 voiceovers to appear in, we don't display the captions directly over the 
 video, but instead send events to the current screen, which is 
 responsible for catching the events and displaying them in a location 
 appropriate for that screen, usually a standard location. In the current 
 spec, all that is provided for is controls to turn closed captions on or 
 off. What would be much better is a way to enable the video element to 
 send caption events, which include the text of the current caption, and 
 can be used to display those captions in a way that fits the design of 
 the content better.

I've added this to the list for version 2 features. I'm interested in 
seeing what the requirements are for captions before we go ahead and spec 
them in too much detail. Implementation feedback will be helpful here.

Thanks for your feedback!


On Mon, 30 Apr 2007, Ralph Giles wrote:
 
 I'd be more in favor of triggering any cue point callbacks that lie 
 between the current playback position and the current playback position 
 of the next frame (audio frame for audio/ and video frame for video/ 
 I guess). That means more bookkeeping to implement your system, but is 
 less surprising in other cases.

Could you elaborate on this? Right now the system triggers cue points up 
to the current displayed frame, and some cue points between the current 
frame and the next frame, if the gap between the frames is long enough 
that the time updates more often than the framerate.


 As I read it, cue points are relative to the current playback position, 
 which does not advance if the stream buffer underruns, but it would if 
 playback restarts after a gap, as might happen if the connection drops, 
 or in an RTP stream. My proposal above would need to be amended to 
 handle that case, and the decoder dropping frames...finding the right 
 language here is hard.

Does the new text work for this?


 A more abstract interface is necessary than just 'caption events'. Here 
 are some use cases worth considering:
 
 * A media file has embedded textual metadata like title, author, 
 copyright license, that the designer would like to access for associated 
 display elsewhere in the page, or to alter the displayed user interface 
 based on the metadata. This is pretty essential for parity with 
 flash-based internet radio players.
 
 * The designer wants to access closed captioned or subtitle text 
 through the DOM as it becomes available for display elsewhere in the 
 page.
 
 * There are points in the media file where the embedded metadata 
 changes. These points cannot be retrieved without scanning the file, 
 which is expensive over the network, and may not be possible in general 
 if the stream is a live feed. Nevertheless, the designer wants to be 
 notified when the associated metadata changes so other elements can be 
 updated. This is in fact the normal case for http streaming 

Re: [whatwg] Cue points in media elements

2007-05-02 Thread Dave Singer

At 17:04  -0400 1/05/07, Brian Campbell wrote:

On May 1, 2007, at 1:05 PM, Kevin Calhoun wrote:

I believe that a cue point is reached if its time is traversed 
during playback.


What does traversed mean in terms of (a) seeking across the cue 
point (b) playing in reverse (rewinding) and (c) the media stalling 
an restarting at a later point in the stream?


I would say that playing (at any rate and in any direction) is a 
continuous function, and therefore cue points are triggered, when 
playing, whenever two samples of the time straddle the cue point 
(where straddel includes one of the samples being at the cue point).


Seeking is discontinuous, and therefore cue points are triggered only 
if a seek results in landing on the cue point, if not playing.  If 
playing, then the usual rules apply.


Frame dropping, stalling, and so on, are aspects of the playback 
behavior and nothing to do with the logical model of cues laid on a 
time axis.

--
David Singer
Apple Computer/QuickTime


Re: [whatwg] Cue points in media elements

2007-05-01 Thread Billy Wong

On 4/29/07, Brian Campbell [EMAIL PROTECTED] wrote:


For the sort of content that we produce, cue points are incredibly
important. Most of our content consists of a video or voiceover
playing while bullet points appear, animations play, and graphics are
revealed, all in sync with the video. We have a very simple system
for doing cue points, that is extremely easy for the content authors
to write and is robust for paused media, media that is skipped to the
end, etc. We simply have a blocking call, WAIT, that waits until a
specific point or the end of a specified media element. For instance,
in our language, you might see something like this:

   (movie Foo.mov :name 'movie)
   (wait @movie (tc 2 3))
   (show @bullet-1)
   (wait @movie)
   (show @bullet-2)

If the user skips to the end of the media clip, that simply causes
all WAITs on that  media clip to return instantly. If they skip
forward in the media clip, without ending it, all WAITs before that
point will return instantly. If the user pauses the media clip, all
WAITs on the media clip will block until it is playing again.

This is a nice system, but I can't see how even as simple a system as
this could be implemented given the current specification of cue
points. The problem is that the callbacks execute when the current
playback position of a media element reaches the cue point. It seems
unclear to me what reaching a particular time means. If video
playback freezes for a second, and so misses a cue point, is that
considered to have been reached? Is there any way that you can
guarantee that a cue point will be executed as long as video has
passed a particular cue point? With a lot of bookkeeping and the
timeupdate event along with the cue points, you may be able to keep
track of the current time in the movie well enough to deal with the
user skipping forward, pausing, and the video stalling and restarting
due to running out of buffer. This doesn't address, as far as I can
tell, issues like the thread displaying the video pausing for
whatever reason and so skipping forward after it resumes, which may
cause cue points to be lost, and which isn't specified to send a
timeupdate event.

Basically, what is necessary is a way to specify that a cue point
should always be fired as long as playback has passed a certain time,
not just if it reaches a particular time. This would prevent us
from having to do a lot of bookkeeping to make sure that cue points
haven't been missed, and make everything simpler and less fragile.



In order to capture this kind of situations, with flexibility in mind, I
think the concept of cue points may be changed to cue periods...

Method names:
addEnterCuePeriod(time1, time2, callback)
removeEnterCuePeriod(time1, time2, callback)
addLeaveCuePeriod(time1, time2, callback)
removeLeaveCuePeriod(time1, time2, callback)

The callback function mentioned by addEnterCuePeriod will be invoked once
when the video enter the period of time bounded by time1 and time2.  How the
video get to a frame between time1 and time2 doesn't matter.  i.e.  the
callback function may be invoked by a normally playing video reaching time1,
a video being fast forward / wind back into the period between time1 
time2, or a particular timing between time1  time2 of the video being
directly seek for.

The mechanism of LeaveCuePeriod is similar, while this time the callback is
invoked when the video leave the specified cue period.  (Or should this pair
of methods left out?)

With these four methods, one can not only achieve the bullet point effect,
but also video captions appearance and disappearance.

Hope this helps.

郁


Re: [whatwg] Cue points in media elements

2007-05-01 Thread Kevin Calhoun


On Apr 30, 2007, at 4:15 PM, Ralph Giles wrote:


[On Apr 29, 2007, at 12:14 AM, Brian Campbell wrote:[

 If video
playback freezes for a second, and so misses a cue point, is that
considered to have been reached?


As I read it, cue points are relative to the current playback  
position,

which does not advance if the stream buffer underruns, but it would
if playback restarts after a gap, as might happen if the connection
drops, or in an RTP stream. My proposal above would need to be amended
to handle that case, and the decoder dropping frames...finding the  
right

language here is hard.


I believe that a cue point is reached if its time is traversed  
during playback.


- Kevin


Re: [whatwg] Cue points in media elements

2007-05-01 Thread Brian Campbell

On Apr 30, 2007, at 7:15 PM, Ralph Giles wrote:


Thanks for adding to the discussion. We're very interested in
implementing support for presentations as well, so it's good
to hear from someone with experience.


Thanks for responding, I'm glad to hear your input.


On Sun, Apr 29, 2007 at 03:14:27AM -0400, Brian Campbell wrote:


in our language, you might see something like this:

  (movie Foo.mov :name 'movie)
  (wait @movie (tc 2 3))
  (show @bullet-1)
  (wait @movie)
  (show @bullet-2)

If the user skips to the end of the media clip, that simply causes
all WAITs on that  media clip to return instantly. If they skip
forward in the media clip, without ending it, all WAITs before that
point will return instantly.


How does this work if, for example, the user seeks forward, and then
back to an earlier position? Would some of the 'show's be undone,  
or do

they not seek backward with the media playback?


We don't expose arbitrary seeking controls to our users; just play/ 
pause, skip forward  back one card (which resets all state to a  
known value) and skip past the current video/audio (which just causes  
all waits on that media element to return instantly).



Is the essential
component of your system that all the shows be called in sequence
to build up a display state, or that the last state trigger before the
current playback point have been triggered?


The former.


Isn't this slow if a bunch
of intermediate animations are triggered by a seek?


Yes, though this is more a bug in our animation API (which could be  
taught to skip directly to the end of an animation when associated  
video/audio ends, but that just hasn't been done yet).


Actually, that brings up another point, which is a bit more  
speculative. It may be nice to have a way to register a callback that  
will be called at animation rates (at least 15 frames/second or so)  
that is called with the current play time of a media element. This  
would allow you to keep animations in sync with video, even if the  
video might stall briefly, or seek forward or backward for whatever  
reason. We haven't implemented this in our current system (as I said,  
it still has the bug that animations still take their full time to  
play even when you skip video), but it may be helpful for this sort  
of thing.



Does your system support live streaming as well? That complicates the
design some when the presentation media updates appear dynamically.


No, we only support progressive download.


Anyway I think you could implement your system with the currently
proposed interface by checking the current playback position and
clearing a separate list of waits inside your timeupdate callback.


I agree, it would be possible, but from my current reading of the  
spec it sounds like some cue points might be missed until quite a bit  
later (since timeupdate isn't guaranteed to be called every time  
anything discontinuous happens with the media). In general, having to  
do extra bookkeeping to keep track of the state of the media may be  
fragile, so stronger guarantees about when cue points are fired is  
better than trying to keep track of what's going on with timeupdate  
events.


I agree this should be clarified. The appropriate interpretation  
should
be when the current playback position reaches the frame  
corresponding to

the queue point, but digital media has quantized frames, while the cue
points are floating point numbers. Triggering all cue point callbacks
between the last current playback position and the current one
(including during seeks) would be one option, and do what you want as
long as you aren't seeking backward. I'd be more in favor of  
triggering

any cue point callbacks that lie between the current playback position
and the current playback position of the next frame (audio frame for
audio/ and video frame for video/ I guess). That means more
bookkeeping to implement your system, but is less surprising in other
cases.


Sure, that would probably work. As I said, bookkeeping is generally a  
problem because it might get out of sync, but with stronger  
guarantees about when cue points are triggered, I think it could work.



  If video
playback freezes for a second, and so misses a cue point, is that
considered to have been reached?


As I read it, cue points are relative to the current playback  
position,

which does not advance if the stream buffer underruns, but it would
if playback restarts after a gap, as might happen if the connection
drops, or in an RTP stream. My proposal above would need to be amended
to handle that case, and the decoder dropping frames...finding the  
right

language here is hard.


Yes, it's a tricky little problem. Our current system stays out of  
trouble because it makes quite a few simplifying assumptions (video  
is played forward only, progressive download, not streaming, etc).  
Obviously, in order to support a more general API, you're 

Re: [whatwg] Cue points in media elements

2007-05-01 Thread Brian Campbell

On May 1, 2007, at 1:05 PM, Kevin Calhoun wrote:

I believe that a cue point is reached if its time is traversed  
during playback.


What does traversed mean in terms of (a) seeking across the cue  
point (b) playing in reverse (rewinding) and (c) the media stalling  
an restarting at a later point in the stream?


Re: [whatwg] Cue points in media elements

2007-05-01 Thread ddailey
Hearing about cue points in media elements. Just sorta reminds me of 
keyTimes in SMIL.


I know SMIL seems funky to some people, but I do really love it! It is so 
way cool! So far as I know it doesn't do quite what you're talking about 
here, but it does similar stuff including non-linear distortions of timing 
elements and the like.


It's declarative (though I don't think it's Turing complete -- wager of 
virtual beans proposed) and its syntax is worthy of emulation in that 
classical ontology recapitulates philology sort of sense. It is so much a 
W3C standard that it has six or eight or twelve standards devoted to it.


David Dailey
(who is trying to learn how not to re-invent wheels)
http://srufaculty.sru.edu/david.dailey/copyright/dailey_on_copyright.htm

Damn bastard mutant wheels keep popping up around me like unwanted 
copyrighted utterances in a world where intellectual landfills are charged 
by the bit!

-- anonymous

- Original Message - 
From: Brian Campbell [EMAIL PROTECTED]

To: Ralph Giles [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Tuesday, May 01, 2007 4:57 PM
Subject: Re: [whatwg] Cue points in media elements



On Apr 30, 2007, at 7:15 PM, Ralph Giles wrote:


Thanks for adding to the discussion. We're very interested in
implementing support for presentations as well, so it's good
to hear from someone with experience.


Thanks for responding, I'm glad to hear your input.


On Sun, Apr 29, 2007 at 03:14:27AM -0400, Brian Campbell wrote:


in our language, you might see something like this:

  (movie Foo.mov :name 'movie)
  (wait @movie (tc 2 3))
  (show @bullet-1)
  (wait @movie)
  (show @bullet-2)

If the user skips to the end of the media clip, that simply causes
all WAITs on that  media clip to return instantly. If they skip
forward in the media clip, without ending it, all WAITs before that
point will return instantly.


How does this work if, for example, the user seeks forward, and then
back to an earlier position? Would some of the 'show's be undone,  or do
they not seek backward with the media playback?


We don't expose arbitrary seeking controls to our users; just play/ pause, 
skip forward  back one card (which resets all state to a  known value) 
and skip past the current video/audio (which just causes  all waits on 
that media element to return instantly).



Is the essential
component of your system that all the shows be called in sequence
to build up a display state, or that the last state trigger before the
current playback point have been triggered?


The former.


Isn't this slow if a bunch
of intermediate animations are triggered by a seek?


Yes, though this is more a bug in our animation API (which could be 
taught to skip directly to the end of an animation when associated 
video/audio ends, but that just hasn't been done yet).


Actually, that brings up another point, which is a bit more  speculative. 
It may be nice to have a way to register a callback that  will be called 
at animation rates (at least 15 frames/second or so)  that is called with 
the current play time of a media element. This  would allow you to keep 
animations in sync with video, even if the  video might stall briefly, or 
seek forward or backward for whatever  reason. We haven't implemented this 
in our current system (as I said,  it still has the bug that animations 
still take their full time to  play even when you skip video), but it may 
be helpful for this sort  of thing.



Does your system support live streaming as well? That complicates the
design some when the presentation media updates appear dynamically.


No, we only support progressive download.


Anyway I think you could implement your system with the currently
proposed interface by checking the current playback position and
clearing a separate list of waits inside your timeupdate callback.


I agree, it would be possible, but from my current reading of the  spec it 
sounds like some cue points might be missed until quite a bit  later 
(since timeupdate isn't guaranteed to be called every time  anything 
discontinuous happens with the media). In general, having to  do extra 
bookkeeping to keep track of the state of the media may be  fragile, so 
stronger guarantees about when cue points are fired is  better than trying 
to keep track of what's going on with timeupdate  events.



I agree this should be clarified. The appropriate interpretation  should
be when the current playback position reaches the frame  corresponding to
the queue point, but digital media has quantized frames, while the cue
points are floating point numbers. Triggering all cue point callbacks
between the last current playback position and the current one
(including during seeks) would be one option, and do what you want as
long as you aren't seeking backward. I'd be more in favor of  triggering
any cue point callbacks that lie between the current playback position
and the current playback position of the next frame (audio frame for
audio

Re: [whatwg] Cue points in media elements

2007-04-30 Thread Ralph Giles
Thanks for adding to the discussion. We're very interested in 
implementing support for presentations as well, so it's good
to hear from someone with experience. 

Since we work on streaming media formats, I always assumed things would 
have to be broken up by the server and the various components streamed 
separately to a browser, and I hadn't noticed the cue point support 
until you pointed it out.

Some comments and questions below...

On Sun, Apr 29, 2007 at 03:14:27AM -0400, Brian Campbell wrote:

 in our language, you might see something like this:
 
   (movie Foo.mov :name 'movie)
   (wait @movie (tc 2 3))
   (show @bullet-1)
   (wait @movie)
   (show @bullet-2)
 
 If the user skips to the end of the media clip, that simply causes  
 all WAITs on that  media clip to return instantly. If they skip  
 forward in the media clip, without ending it, all WAITs before that  
 point will return instantly.

How does this work if, for example, the user seeks forward, and then
back to an earlier position? Would some of the 'show's be undone, or do 
they not seek backward with the media playback? Is the essential 
component of your system that all the shows be called in sequence 
to build up a display state, or that the last state trigger before the 
current playback point have been triggered? Isn't this slow if a bunch 
of intermediate animations are triggered by a seek?

Does your system support live streaming as well? That complicates the 
design some when the presentation media updates appear dynamically.

Anyway I think you could implement your system with the currently 
proposed interface by checking the current playback position and 
clearing a separate list of waits inside your timeupdate callback.

 This is a nice system, but I can't see how even as simple a system as  
 this could be implemented given the current specification of cue  
 points. The problem is that the callbacks execute when the current  
 playback position of a media element reaches the cue point. It seems  
 unclear to me what reaching a particular time means.

I agree this should be clarified. The appropriate interpretation should 
be when the current playback position reaches the frame corresponding to 
the queue point, but digital media has quantized frames, while the cue 
points are floating point numbers. Triggering all cue point callbacks 
between the last current playback position and the current one 
(including during seeks) would be one option, and do what you want as 
long as you aren't seeking backward. I'd be more in favor of triggering
any cue point callbacks that lie between the current playback position 
and the current playback position of the next frame (audio frame for 
audio/ and video frame for video/ I guess). That means more 
bookkeeping to implement your system, but is less surprising in other 
cases.

   If video  
 playback freezes for a second, and so misses a cue point, is that  
 considered to have been reached?

As I read it, cue points are relative to the current playback position, 
which does not advance if the stream buffer underruns, but it would
if playback restarts after a gap, as might happen if the connection
drops, or in an RTP stream. My proposal above would need to be amended
to handle that case, and the decoder dropping frames...finding the right 
language here is hard.

 In the current spec, all that is  
 provided for is controls to turn closed captions on or off. What  
 would be much better is a way to enable the video element to send  
 caption events, which include the text of the current caption, and  
 can be used to display those captions in a way that fits the design  
 of the content better.

I really like this idea. It would also be nice if, for example, the 
closed caption text were available through the DOM so it could be
presented elsewhere, searched locally, and so on. But what about things 
like album art, which might be embedded in an audio stream? Should that 
be accessible? Should a video element expose a set of known cue points 
embedded in the file? 

A more abstract interface is necessary than just 'caption events'. Here 
are some use cases worth considering:

* A media file has embedded textual metadata like title, author, 
copyright license, that the designer would like to access for associated 
display elsewhere in the page, or to alter the displayed user interface
based on the metadata. This is pretty essential for parity with 
flash-based internet radio players.

* A media file has embedded non-textual metadata like an album cover 
image, that the designer would like to access for display elsewhere in
the page.

* The designer wants to access closed captioned or subtitle text 
through the DOM as it becomes available for display elsewhere in the 
page.

* There are points in the media file where the embedded metadata 
changes. These points cannot be retrieved without scanning the file, 
which is expensive over 

[whatwg] Cue points in media elements

2007-04-29 Thread Brian Campbell
I'm a developer of a custom engine for interactive multimedia, and  
I've recently noticed the work WHATWG has been doing on adding  
video and audio elements to HTML. I'm very glad to see these  
being proposed for addition to HTML, because if they (and several  
other features) are done right, it means that there may be a chance  
for us to stop using a custom engine, and use an off-the-shelf HTML  
engine, putting our development focus on our authoring tools instead.  
My hope is that eventually, if these features get enough penetration,  
to put our content up on the web directly, rather than having to  
distribute the runtime software with it.


I've taken a look at the current specification for media elements,  
and on the whole, it looks like it would meet our needs. We are  
currently using VP3, and a combination of MP3 and Vorbis audio, for  
our codecs, so having Ogg Theora (based on VP3) and Ogg Vorbis as a  
baseline would be completely fine with us, and much preferable to the  
patent issues and licensing fees we'd need to deal with if we used  
MPEG4.


For the sort of content that we produce, cue points are incredibly  
important. Most of our content consists of a video or voiceover  
playing while bullet points appear, animations play, and graphics are  
revealed, all in sync with the video. We have a very simple system  
for doing cue points, that is extremely easy for the content authors  
to write and is robust for paused media, media that is skipped to the  
end, etc. We simply have a blocking call, WAIT, that waits until a  
specific point or the end of a specified media element. For instance,  
in our language, you might see something like this:


  (movie Foo.mov :name 'movie)
  (wait @movie (tc 2 3))
  (show @bullet-1)
  (wait @movie)
  (show @bullet-2)

If the user skips to the end of the media clip, that simply causes  
all WAITs on that  media clip to return instantly. If they skip  
forward in the media clip, without ending it, all WAITs before that  
point will return instantly. If the user pauses the media clip, all  
WAITs on the media clip will block until it is playing again.


This is a nice system, but I can't see how even as simple a system as  
this could be implemented given the current specification of cue  
points. The problem is that the callbacks execute when the current  
playback position of a media element reaches the cue point. It seems  
unclear to me what reaching a particular time means. If video  
playback freezes for a second, and so misses a cue point, is that  
considered to have been reached? Is there any way that you can  
guarantee that a cue point will be executed as long as video has  
passed a particular cue point? With a lot of bookkeeping and the  
timeupdate event along with the cue points, you may be able to keep  
track of the current time in the movie well enough to deal with the  
user skipping forward, pausing, and the video stalling and restarting  
due to running out of buffer. This doesn't address, as far as I can  
tell, issues like the thread displaying the video pausing for  
whatever reason and so skipping forward after it resumes, which may  
cause cue points to be lost, and which isn't specified to send a  
timeupdate event.


Basically, what is necessary is a way to specify that a cue point  
should always be fired as long as playback has passed a certain time,  
not just if it reaches a particular time. This would prevent us  
from having to do a lot of bookkeeping to make sure that cue points  
haven't been missed, and make everything simpler and less fragile.


We're also greatly interested in making our content accessible, to  
meet Section 508 requirements. For now, we are focusing on captioning  
for the deaf. We have voiceovers on some screens with no associated  
video, video that appears in various places on the screen, and the  
occasional sound effects. Because there is not a consistent video  
location, nor is there even a frame for voiceovers to appear in, we  
don't display the captions directly over the video, but instead send  
events to the current screen, which is responsible for catching the  
events and displaying them in a location appropriate for that screen,  
usually a standard location. In the current spec, all that is  
provided for is controls to turn closed captions on or off. What  
would be much better is a way to enable the video element to send  
caption events, which include the text of the current caption, and  
can be used to display those captions in a way that fits the design  
of the content better.


I hope these comments make sense; let me know if you have any  
questions or suggestions.


Thanks,
Brian Campbell
Interactive Media Lab, Dartmouth College
http://iml.dartmouth.edu