Hi Silvia,

Thank you for your reply. Your comments are quite helpful to understand how WebVTT can be used or not. See my comments below.

Le 7/26/2012 12:44 AM, Silvia Pfeiffer a écrit :
On Wed, Jul 25, 2012 at 11:45 PM, Cyril Concolato
<cyril.concol...@telecom-paristech.fr> wrote:
Right now it is fully defined how data in a TextTrack (of the defined
kinds) is displayed on top of the video. As this is as yet unclear for
SVG resources,
I wouldn't say it's unclear, I'd say it needs to be specified ;) meaning
that it probably doesn't require much specification. I was thinking that we
could use the CSS box of the video element to position the SVG, as if the
SVG was put in a div.
Let's work on this basis and see where we get. There's also
positioning issues
What do you mean here by "positioning issues"? SVG handles the positioning within its viewbox and what I propose is to define the size and position of this viewbox in the parent coordinate system, i.e. with respect to the video. I don't see what else is needed? or do you mean when SVG is transported in cue, how do you use the cue settings?

etc. so it's not as simple as just putting the SVG
in a cue.


I would suggest using the @metadata tack kind for now
and providing the SVG as markup in a TextTrackCue (either from WebVTT
cues
I've tried this option but I'm facing several problems (Tested with Chrome
Version 22.0.1216.0 canary).

The first problem is how to embed SVG in a cue? Should the '<', '>' and
other characters be escaped or not? According to Anne's validator,
So, I assume you created WebVTT files. (You don't have to - you can
directly use the TextTrack API.)

Anne's validator validates the WebVTT rules for caption and subtitle
kinds. For "metadata" kinds, there should be no parsing of the cues in
browsers.
Reading the spec again, I see that the parsing rules for "WebVTT metadata text" are different indeed. My mistake.

A validator can only decide whether to parse the cues
according to "captions"/"subtitles", or "chapters", or "metadata"
rules if the WebVTT file has such an indicator. I've asked for such
information to be included in WebVTT, but we don't currently have such
markup/metadata.
Do you mean that you would like to have some signaling in the WebVTT file (for instance in the header) to indicate the type of the cue payload? I think that'll be interesting. Otherwise, it'll be interesting to have a type selector in the validator.


they
should be.
Actually, for @kind=metadata you don't need to escape '<' or '>'.
Yes, I had missed that.


But if I use them, then the parsing of the escaped string returns
'empty document'
(http://perso.telecom-paristech.fr/~concolat/html5_tests/getcueasSVG-escaped.html).
Which parsing? Anne's validator? Have you tried Chrome directly?
Here I meant using a DOMParser object in JavaScript using Chrome.

http://perso.telecom-paristech.fr/~concolat/html5_tests/svg-escaped.vtt
does look very ugly.
Indeed, but as you said, for the @kind=metadata it is not needed.


However, if I don't escape them, the parsing doesn't fail and returns an SVG
document
(http://perso.telecom-paristech.fr/~concolat/html5_tests/getcueasSVG.html).
cue.text is the SVG code? That's what we want, right?
(http://perso.telecom-paristech.fr/~concolat/html5_tests/svg.vtt looks
much nicer)
Yes, cue.text seems to be the best option when using WebVTT.

In any case, I think embedding the SVG in WEBVTT does not really make sense.
Why not?
I should have said forcing the embedding of SVG in WEBVTT does not make sense, since there is some overlap between the 2 (timing, positioning ...), some overhead, and limitations (see below).

An other problem is in terms of design. SVG has a timing model (similar to
TTML), WebVTT another. For instance, SVG can express things like repetitions
of animations that WebVTT cannot. Are you saying that TTML should be carried
in a WebVTT file?
TTML in WebVTT probably doesn't make sense. But SVG's timing model can
be a applied within the timeframe of a cue, so that does make sense.
Maybe, yes. It might make sense if your cue has a long duration, otherwise the overhead of loading an SVG document for each cue might be too big. But in general, since you can structure an SVG document with a frame-based structure (see this cartoon for instance: http://perso.telecom-paristech.fr/~concolat/SVG/flash10.svg <http://perso.telecom-paristech.fr/%7Econcolat/SVG/flash10.svg>), I don't see the added value of WebVTT to carry SVG.

How would you specify this with TTML? It would run into the same
problems, wouldn't it?
I think so, the problems would be similar. But again, TTML can also express frame-based animations, why should you add the WebVTT layer?

Similarly, in terms of design, embedding SVG in cues requires repeating a
lot of SVG content at each cue (see
http://perso.telecom-paristech.fr/~concolat/html5_tests/svg.vtt), as this
approach requires parsing an entire document at each cue. You could probably
envisage overlapping cues but that would require a lot of overhead.
Leveraging the progressive loading of SVG cannot be done this way either.
In general, I think it would make sense to leverage the browsers' support
for SVG and not stack different technologies.
Sure, it should use existing SVG support. I'm not so sure I agree with
not stacking - that depends.
The only possible added value (I think) for carrying SVG in WebVTT would be to use the cue settings to position the SVG with respect to the video, like the line positioning. But I'm not sure, and that probably should be part of CSS and be applicable without WebVTT. Do you have other examples of the added value of carrying SVG in WebVTT?
What would your preferred markup for
http://perso.telecom-paristech.fr/~concolat/html5_tests/svg.vtt be ?
How would you avoid the duplication?
For instance, you would want to be able to construct the SVG document progressively, to have only one document that you modify by adding more data. One way to do it would be to have the first cue contain the beginning of the document and the following cues contain more data, but since modifying the document after its load is tricky, this would require concatenating all previous cue texts and then parsing that as a new document (ugly!). I'd like to have the parsing step done under the hood by the browser, as it usually do.

Another problem is that I don't know if it's possible to display the SVG
content in a layer between the video and the UI controls. Currently, I
display the SVG on top of the video element, therefore the UI controls are
not accessible for clicks. Having to embed my own UI controls for that is a
bit of a pain. And, semantically, when reading the spec, 'metadata' tracks
say " Not displayed by the user agent. " so I think this might be a bit
confusing for users/authors.
All publishers that want the same controls in all browsers make their
own controls anyway. If you make a library for SVG display on top of a
video, you can also make one for the controls (or use one of the many
existing ones).
That's an option, but that shouldn't be the only one.

The third problem is performance-wise. In my example, the blue line (in
SVG), when synchronized with the video, should be aligned with the moving
(white-gray) edge of the pie. As you can see, this is not the case. Only 4-5
cuechange events seems to be processed properly. I noticed the same problem
with 'timeupdate' events. Also, I've noticed that even though my WebVTT file
is designed to have only one active cue at a time, for some cuechange
events, there are 2. This might be an implementation issue but this might be
a problem of reentrant code (the cuechange callback being called while it's
not finished), but in general, I'm not sure it's a good idea to go through
the Javascript engine to do that, for the processing overhead.
TextTrack support is still very new. I agree that its update frequency
should be more often than the timeupdate events. Your example is
indeed pushing the boundaries. Basically you are asking it to draw a
clock handle in synch with a video that is updating its clock pie
every video frame. TextTrack was built for relatively "rare" events
along the timeline of a video - certainly not for something that needs
an update with every video frame. Going through WebVTT makes this
particularly slow.
If you try my example here (http://perso.telecom-paristech.fr/~concolat/html5_tests/getcueasSVG.html <http://perso.telecom-paristech.fr/%7Econcolat/html5_tests/getcueasSVG.html>), you'll see that changing the playback speed (even to 0.1) does not guarantee synchronization either. By the time the JS has processed the content, it's already too late. It might be an implementation issue but it's symptomatic of the stacking, that's why I think we should leverage the native parsing, synchronization and support for SVG rendering (not through JS). The clock might be a (not so) extreme case, but I don't think I'm trying to do very fancy things here, just trying to reproduce existing technologies (proprietary or not) with existing web standards.


or from JavaScript calls to addTextTrack()).
Can you elaborate on this one? However, I suspect it'll have the same
processing overhead.
I'm not sure. Having to repeatedly parse WebVTT cues and draw the SVG
image makes this particularly slow. Have you tried to paint the SVG
just once on the video and using TextTrackCues just to change the
transform value using JavaScript? Upon a cuechange event, you re-draw
the SVG.
I could give it a try if I have some time but I'm not really sure I understand what you're suggesting. Do you mean using addCue? Could you give an example? Are you suggesting something similar to the example in the spec with

var sounds = sfx.addTextTrack('metadata');

for
instance reusing the viewport/viewbox negotiation phase. There would also be
a need to make a more generic Track API or to replace the TextTrack API by
the SVG API when the track is of kind 'graphics'.
I don't understand this requirement. What API needs are there aside
from the synchronization? Trying to replicate SVG APIs through the
TextTrack API seems like a repetition of the API and thus fragile.
Sorry for the confusion here. I didn't mean to replicate the SVG APIs here
but I just meant that the TextTrack API is very specific to 'pure' text
tracks (and even to WebVTT text tracks). You might want to expose the SVG
API when SVG content is used for the overlay to control it.
Can you make an example? How do you think that should look?
I was thinking of having something like the following. Pardon my IDL mistakes. Also note that it is not really a proposal, I haven't thought thoroughly of all the consequences, but it is just to give an idea.

enumTextTrackMode  { "disabled  
<http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-disabled>","hidden  
<http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-hidden>","showing  
<http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-showing>" };
interfaceTrack  :EventTarget  
<http://www.whatwg.org/specs/web-apps/current-work/#eventtarget>  {
  readonly attribute DOMStringkind  
<http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-kind>;
  readonly attribute DOMStringlabel  
<http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-label>;
  readonly attribute DOMStringlanguage  
<http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-language>;
  readonly attribute DOMStringinBandMetadataTrackDispatchType  
<http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-inbandmetadatatrackdispatchtype>;

           attributeTextTrackMode  
<http://www.whatwg.org/specs/web-apps/current-work/#texttrackmode>  mode  
<http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-mode>;

};

interfaceTextTrack  : Track  
<http://www.whatwg.org/specs/web-apps/current-work/#eventtarget>  {
  readonly attributeTextTrackCueList  
<http://www.whatwg.org/specs/web-apps/current-work/#texttrackcuelist>?cues  
<http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-cues>;
  readonly attributeTextTrackCueList  
<http://www.whatwg.org/specs/web-apps/current-work/#texttrackcuelist>?activeCues  
<http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-activecues>;

  voidaddCue  
<http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-addcue>(TextTrackCue
  <http://www.whatwg.org/specs/web-apps/current-work/#texttrackcue>  cue);
  voidremoveCue  
<http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-removecue>(TextTrackCue
  <http://www.whatwg.org/specs/web-apps/current-work/#texttrackcue>  cue);

           attributeEventHandler  
<http://www.whatwg.org/specs/web-apps/current-work/#eventhandler>  oncuechange  
<http://www.whatwg.org/specs/web-apps/current-work/#handler-texttrack-oncuechange>;

};

interfaceGraphicsDocumentTrack  : Track {
           attribute Document trackDocument;
};

The basic Track interface would be almost the same as the VideoTrack or AudioTrack. The GraphicsDocumentTrack interface would be used for tracks which have an underlying document (TTML, SVG, SMIL?, HTML?...) with a visual representation and not necessarily based on cues. For these documents, you are not interested in cues or cue changes (and it might not make sense to define cues). For these, you're only interested in: - the dispatch of the track content to the parser being done automatically by the browser (no need to use a JS DOMParser); - the rendering of the underlying document being synchronized (natively) by the browser, i.e. the timeline of the underlying document should be locked with the timeline of the video (or audio). No need to monitor cue changes to render the right SVG. You could discriminate between a TextTrack and a GraphicsDocumentTrack by a mime type or the inBandMetadataTrackDispatchType (not sure...). When the track carries SVG, the trackDocument object could be an SVGDocument. This would allow for controlling the SVG as if it was embedded in the HTML but for the synchronization done by the browser. What do you think?

Hoping I'm clear,
Cyril

--
Cyril Concolato
Maître de Conférences/Associate Professor
Groupe Multimedia/Multimedia Group
Telecom ParisTech
46 rue Barrault
75 013 Paris, France
http://concolato.wp.mines-telecom.fr/

Reply via email to