Re: [whatwg] WebVTT feedback (was Re: Video feedback)
OK, I'll just keep adding feedback to this thread. This is feedback from the Webkit team about implementing WebVTT support. 1. White space between cue settings It seems that where we have specified how to parse the cue settings, we only allow a single white space as separator between subsequent cue settings: http://www.whatwg.org/specs/web-apps/current-work/webvtt.html#parse-the-webvtt-settings Thus, something like this is allowed: D:vertical A:middle but not something like this: D:vertical A:middle. I think we need to add a skip white space in step three. 2. cue order parsing While the syntax spec says The time represented by this WebVTT timestamp must be greater than or equal to the start time offsets of all previous cues in the file. , there is no step in the parse that will ascertain that cues that come our of time are dropped on the floor. Do we need to include such a requirement before step 40 of the parser? Cheers, Silvia.
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
On Mon, Jun 27, 2011 at 6:07 PM, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Mon, Jun 27, 2011 at 5:34 PM, Anne van Kesteren ann...@opera.com wrote: On Mon, 27 Jun 2011 09:32:04 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: Note that where his implementation differs from the spec, he has made a note. There are only two such notes. I'd like to see these addressed, too. Could you please post these to the list so that we not all have to read those documents? Good point. :-) (Just search for differs). Here they are - with some additional descriptions: 1. [text track cue] size: this document differs from specs in that way that [text track cue] is as width (for horizontal, height for vertical) as the widest (for horizontal, highest for vertical) [text track cue line] within What Ronny says there is that in his implementation the default display size of the cue (i.e. the dark box that the cue is displayed in) is only as wide as the longest line in the cue (or high where we're dealing with vertical direction). Currently, the spec puts as a default S:100%. I personally also prefer this smaller default cue width because it covers less content of the video. 2. Cue voice tag: this differs from specs in the way that opened v voice tags should be closed with /v Ronny's point is that the v element is expected to be closed, because it makes it easier to parse. So, instead of: 00:01:07.395 -- 00:01:10.246 v John DoHey! v Jane DoeHello! he expects: 00:01:07.395 -- 00:01:10.246 v John DoHey!/v v Jane DoeHello!/v I think the same is true for his implementation of the c class tags. Cheers, Silvia. Adding one more to the list of things I've come across with the WebVTT spec. I am right now trying to figure out how vertical growing left cues (i.e. cues with a cue rendering setting of D:vertical) are rendered. If nothing else is set on the cue, my expectation would be that the cue would be rendered on the right side of the video viewport, since it's growing to the left. As I follow through the algorithm at http://www.whatwg.org/specs/web-apps/current-work/webvtt.html#webvtt-cue-text-rendering-rules , I find that the default settings are: * the text track cue line position default is auto, * the snap-to-lines flag is true by default, * block flow is left to right and in step 9 we get: If the text track cue writing direction is vertical growing left, and the text track cue snap-to-lines flag is set, let x-position be zero. I think this is incorrect and should be ..., let x-position be 100 so as to allow the text boxes to flow onto the video viewport from the right boundary, rather than off its left border. Note that step 9 for vertical growing right is correct: If the text track cue writing direction is vertical growing right, and the text track cue snap-to-lines flag is set, let x-position be zero. and the text should grow from the left boundary of the video viewport to the right. Cheers, Silvia.
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
Hi Marc, On Wed, Jul 20, 2011 at 10:06 AM, Marc 'Tafouk' w...@millie.uk.to wrote: Hello folks, I've been following the latest developments on the WebVTT specification and am making an attempt to write an out-of-browser parser, using Anna Cavender's proposed patches to WebKit. Cool! Is this a new video player app or going into, say, VLC or something similar? First, I filed a request on the bugtracker http://www.w3.org/Bugs/Public/show_bug.cgi?id=13292 regarding the end- of-file marker that's mentioned in the current draft http://www.whatwg.org/specs/web-apps/current-work/#webvtt-cue-text- parsing-rules I replied. IIUC, it's just the EOF state that is meant, not an actual character. I have another question about self-closing tags in cue text. It seems they're not supported at all. None of the tags that we have mean anything if they self-close (and the timestamp is implicitly closing). The U+002F SOLIDUS character (/) is only handled in the WebVTT tag state. Test case 1-a): WEBVTT 00:00.000 -- 00:02.000 Initial b/ test U+0062 (b) triggers WebVTT start tag state; U+002F is then handled as Anything else and is appended to result (tagname = b/). Yes. The next character is then a and causes in the next loop to return an end tag. Then, end tags are parsed and it's not in the list that we expect, so this happens: Otherwise, ignore the token. Thus, b/ is ignored. Test case 1-b): WEBVTT 00:00.000 -- 00:02.000 Initial b / test U+0062 (b) triggers WebVTT start tag state; U+0020 (space) triggers WebVTT start tag annotation state; U+002F is handled as Anything else and is appended to buffer (annotation = /). Once is reached, this leads to a start tag b with an annotation of /. From how I read it, the annotation string gets ignored. I am aware those may be moot atm because there is no void element AFAIK, and the current tags make no sense when immediately closed. They still have to parse correctly. But I think from analysing the spec they actually do. I also found a slight issue when following the parser specs : there is no validation of the class attribute. http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#attach-a-webvtt-internal-node-object says to attach the list of classes to the element. Right now, all characters are allowed for class names bar space, tab, . and . It might indeed be an idea to restrict these character to those allowed for class names in HTML. Test case 2): WEBVTT 00:00.000 -- 00:02.000 Second c.. [my annotation] test classes is a list of 10 empty strings. While possibly a bit or unneeded overhead, in http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#webvtt-cue-text-dom-construction-rules when mapping to HTML happens, they just create an additional space in the class attribute, so are not harmful. Cheers, Silvia.
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
On Mon, 27 Jun 2011 09:32:04 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: Note that where his implementation differs from the spec, he has made a note. There are only two such notes. I'd like to see these addressed, too. Could you please post these to the list so that we not all have to read those documents? Thanks, -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
On Mon, Jun 27, 2011 at 5:34 PM, Anne van Kesteren ann...@opera.com wrote: On Mon, 27 Jun 2011 09:32:04 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: Note that where his implementation differs from the spec, he has made a note. There are only two such notes. I'd like to see these addressed, too. Could you please post these to the list so that we not all have to read those documents? Good point. :-) (Just search for differs). Here they are - with some additional descriptions: 1. [text track cue] size: this document differs from specs in that way that [text track cue] is as width (for horizontal, height for vertical) as the widest (for horizontal, highest for vertical) [text track cue line] within What Ronny says there is that in his implementation the default display size of the cue (i.e. the dark box that the cue is displayed in) is only as wide as the longest line in the cue (or high where we're dealing with vertical direction). Currently, the spec puts as a default S:100%. I personally also prefer this smaller default cue width because it covers less content of the video. 2. Cue voice tag: this differs from specs in the way that opened v voice tags should be closed with /v Ronny's point is that the v element is expected to be closed, because it makes it easier to parse. So, instead of: 00:01:07.395 -- 00:01:10.246 v John DoHey! v Jane DoeHello! he expects: 00:01:07.395 -- 00:01:10.246 v John DoHey!/v v Jane DoeHello!/v I think the same is true for his implementation of the c class tags. Cheers, Silvia.
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
Hi all, While we're on the topic of providing feedback on WebVTT, I want to add some things that have crept up while trying to implement the spec line by line. http://www.whatwg.org/specs/web-apps/current-work/webvtt.html 1. Text Track cue size In the parsing section for cues, step 27, the default for cue is set to 100. This means that every cue that has no explicit size setting (S:) will occupy the full width of the video viewport (height if vertical renering), even if the displayed text is only short, such as [music]. I believe that is not the best default means of rendering subtitles and captions, because more of the video's pixel are obstructed than is necessary by the cue background box with its dark grey background rgba(0,0,0,0.8). Instead, it would make a lot more sense to just have a background box cover the screen estate that the text needs, i.e. put the background color only on the Text boxes themselves. This is how YouTube do it. Alternatively, we could have the background box just cover the bounding box of all the Text boxes inside it, which will make it a rectangular display of each caption cue, but bound to the width of the longest text line length. 2. Text Track default cue line position In the parsing section for cues, step 25, the default line position for cues is 'auto' and the default snap-to-lines flag is true. For cues that have no explicit line position setting (L:), this means that the height of the cue ends up getting y-position of 0 (see Section 2 with the WebVTT cue text rendering rules, step 10, substep 9, first case ). The y-position in turn leads in substep 10 to setting the top property to y-position vh, which is 0 percent of the video's height. top:0 means that the cue is now placed by default at the top of the video viewport. Instead, it would make a lot more sense to have it rendered by default at the bottom of the video viewport, since that is how captions and subtitles in the past have by default been rendered. Thus, I would suggest to mean that an auto line position is mapped to the y-position of 100 in Section 2, step 10, substep 9, first case. 3. Calculation of Text Track cue line position Assuming we've set a L:100% on a cue, then according to Section 2, step 10, substep 9, second case we arrive at a y-position of 100, leading to the setting of top to 100% of the video's height. This means that the cue will disappear beyond the bottom of the video viewport. Is that intended? Also, shouldn't the caption text box have been centered in the middle of the caption text box's height at the L position rather than at the top of that box? 4. Calculation of Text Track cue text position Similarly as for the vertical line positioning, I wonder whether there is a problem with the horizontal T: text positioning. When we specify T:25% on an A:middle cue box, the box is moved half its size to the left of the T position, i.e. it ends up at -12.5% of the video viewport's width. Is that intended? Should there be a way to limit how far a box can be moved off the video viewport? Should it continue to be visible when moved off the video viewport? Cheers, Silvia. (and thanks to Ronny for helping to surface some of this)
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
On Wed, 08 Jun 2011 02:54:45 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: Hi Philip, all, On Tue, Jun 7, 2011 at 8:12 PM, Philip Jägenstedt phil...@opera.com wrote: On Sat, 04 Jun 2011 17:05:55 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Mon, 3 Jan 2011, Philip J盲genstedt wrote: Silvia, is your mail client a bit funny with character encodings? (The UTF-8 representation of U+00E4 is the same as the GBK representation of U+76F2.) I'm using GMAIL, so if there is anything wrong, you'll have to report it to Google. ;-) Checking back, I actually received your name in Ian's email with that funny encoding. I'm not sure it's gmail's fault for interpreting it in this way or whether there was some information in email headers lost during delivery or what else. * The bad cue handling is stricter than it should be. After collecting an id, the next line must be a timestamp line. Otherwise, we skip everything until a blank line, so in the following the parser would jump to bad cue on line 2 and skip the whole cue. 1 2 00:00:00.000 -- 00:00:01.000 Bla This doesn't match what most existing SRT parsers do, as they simply look for timing lines and ignore everything else. If we really need to collect the id instead of ignoring it like everyone else, this should be more robust, so that a valid timing line always begins a new cue. Personally, I'd prefer if it is simply ignored and that we use some form of in-cue markup for styling hooks. The IDs are useful for referencing cues from script, so I haven't removed them. I've also left the parsing as is for when neither the first nor second line is a timing line, since that gives us a lot of headroom for future extensions (we can do anything so long as the second line doesn't start with a timestamp and -- and another timestamp). In the case of feeding future extensions to current parsers, it's way better fallback behavior to simply ignore the unrecognized second line than to discard the entire cue. The current behavior seems unnecessarily strict and makes the parser more complicated than it needs to be. My preference is just ignore anything preceding the timing line, but even if we must have IDs it can still be made simpler and more robust than what is currently spec'ed. If we just ignore content until we hit a line that happens to look like a timing line, then we are much more constrained in what we can do in the future. For example, we couldn't introduce a comment block syntax, since any comment containing a timing line wouldn't be ignored. On the other hand if we keep the syntax as it is now, we can introduce a comment block just by having its first line include a -- but not have it match the timestamp syntax, e.g. by having it be -- COMMENT or some such. Looking at the parser more closely, I don't really see how doing anything more complex than skipping the block entirely would be simpler than what we have now, anyway. Yes, I think that can work. The pattern of a line with -- without time markers is currently ignored, so we can introduce something with it for special content like comments, style and default. This seems to have been Ian's assumption, but it's not what the spec says. Follow the steps in http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#parsing-0 32. If line contains the three-character substring -- (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), then jump to the step labeled timings below. 40. Timings: Collect WebVTT cue timings and settings from line, using cue for the results. If that fails, jump to the step labeled bad cue. 54. Bad cue: Discard cue. (Followed by a loop to skip until the next empty line.) The effect is that that any line containing -- that is not a timing line causes everything up to the next newline to be ignored. Yes, that's what I expect. Therefore we can create such cues in the file format right now and the browsers as they currently work will ignore such content. In future, they can be extended to actually do something sensible with it. Isn't that what is currently ignored means? It doesn't break the parser - the parser just skips over it. Am I missing something? OK, I guess we're talking about slightly different things. It is possible to add a syntax to comment out entire cues using something with --, so if that's all we want, that's fine. * Voice synthesis of e.g. mixed English/French captions. Given that this would only be useful to be people who know both languages, it seem not worth complicating the format for. Agreed on all fronts. I disagree with the third case. Many people speak more than one language and even if they don't speak the language that is in use in a cue, it is still bad to render it in using the wrong language model, in particular if it is rendered by a screen reader. We really need a
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
On Wed, Jun 8, 2011 at 6:39 PM, Philip Jägenstedt phil...@opera.com wrote: On Wed, 08 Jun 2011 02:54:45 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: Hi Philip, all, On Tue, Jun 7, 2011 at 8:12 PM, Philip Jägenstedt phil...@opera.com wrote: On Sat, 04 Jun 2011 17:05:55 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Mon, 3 Jan 2011, Philip J盲genstedt wrote: Silvia, is your mail client a bit funny with character encodings? (The UTF-8 representation of U+00E4 is the same as the GBK representation of U+76F2.) I'm using GMAIL, so if there is anything wrong, you'll have to report it to Google. ;-) Checking back, I actually received your name in Ian's email with that funny encoding. I'm not sure it's gmail's fault for interpreting it in this way or whether there was some information in email headers lost during delivery or what else. * The bad cue handling is stricter than it should be. After collecting an id, the next line must be a timestamp line. Otherwise, we skip everything until a blank line, so in the following the parser would jump to bad cue on line 2 and skip the whole cue. 1 2 00:00:00.000 -- 00:00:01.000 Bla This doesn't match what most existing SRT parsers do, as they simply look for timing lines and ignore everything else. If we really need to collect the id instead of ignoring it like everyone else, this should be more robust, so that a valid timing line always begins a new cue. Personally, I'd prefer if it is simply ignored and that we use some form of in-cue markup for styling hooks. The IDs are useful for referencing cues from script, so I haven't removed them. I've also left the parsing as is for when neither the first nor second line is a timing line, since that gives us a lot of headroom for future extensions (we can do anything so long as the second line doesn't start with a timestamp and -- and another timestamp). In the case of feeding future extensions to current parsers, it's way better fallback behavior to simply ignore the unrecognized second line than to discard the entire cue. The current behavior seems unnecessarily strict and makes the parser more complicated than it needs to be. My preference is just ignore anything preceding the timing line, but even if we must have IDs it can still be made simpler and more robust than what is currently spec'ed. If we just ignore content until we hit a line that happens to look like a timing line, then we are much more constrained in what we can do in the future. For example, we couldn't introduce a comment block syntax, since any comment containing a timing line wouldn't be ignored. On the other hand if we keep the syntax as it is now, we can introduce a comment block just by having its first line include a -- but not have it match the timestamp syntax, e.g. by having it be -- COMMENT or some such. Looking at the parser more closely, I don't really see how doing anything more complex than skipping the block entirely would be simpler than what we have now, anyway. Yes, I think that can work. The pattern of a line with -- without time markers is currently ignored, so we can introduce something with it for special content like comments, style and default. This seems to have been Ian's assumption, but it's not what the spec says. Follow the steps in http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#parsing-0 32. If line contains the three-character substring -- (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), then jump to the step labeled timings below. 40. Timings: Collect WebVTT cue timings and settings from line, using cue for the results. If that fails, jump to the step labeled bad cue. 54. Bad cue: Discard cue. (Followed by a loop to skip until the next empty line.) The effect is that that any line containing -- that is not a timing line causes everything up to the next newline to be ignored. Yes, that's what I expect. Therefore we can create such cues in the file format right now and the browsers as they currently work will ignore such content. In future, they can be extended to actually do something sensible with it. Isn't that what is currently ignored means? It doesn't break the parser - the parser just skips over it. Am I missing something? OK, I guess we're talking about slightly different things. It is possible to add a syntax to comment out entire cues using something with --, so if that's all we want, that's fine. * Voice synthesis of e.g. mixed English/French captions. Given that this would only be useful to be people who know both languages, it seem not worth complicating the format for. Agreed on all fronts. I disagree with the third case. Many people speak more than one language and even if they don't speak the language that is in use in a cue, it is still bad to render it in using
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
On Sat, 04 Jun 2011 17:05:55 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Mon, 3 Jan 2011, Philip J盲genstedt wrote: Silvia, is your mail client a bit funny with character encodings? (The UTF-8 representation of U+00E4 is the same as the GBK representation of U+76F2.) * The bad cue handling is stricter than it should be. After collecting an id, the next line must be a timestamp line. Otherwise, we skip everything until a blank line, so in the following the parser would jump to bad cue on line 2 and skip the whole cue. 1 2 00:00:00.000 -- 00:00:01.000 Bla This doesn't match what most existing SRT parsers do, as they simply look for timing lines and ignore everything else. If we really need to collect the id instead of ignoring it like everyone else, this should be more robust, so that a valid timing line always begins a new cue. Personally, I'd prefer if it is simply ignored and that we use some form of in-cue markup for styling hooks. The IDs are useful for referencing cues from script, so I haven't removed them. I've also left the parsing as is for when neither the first nor second line is a timing line, since that gives us a lot of headroom for future extensions (we can do anything so long as the second line doesn't start with a timestamp and -- and another timestamp). In the case of feeding future extensions to current parsers, it's way better fallback behavior to simply ignore the unrecognized second line than to discard the entire cue. The current behavior seems unnecessarily strict and makes the parser more complicated than it needs to be. My preference is just ignore anything preceding the timing line, but even if we must have IDs it can still be made simpler and more robust than what is currently spec'ed. If we just ignore content until we hit a line that happens to look like a timing line, then we are much more constrained in what we can do in the future. For example, we couldn't introduce a comment block syntax, since any comment containing a timing line wouldn't be ignored. On the other hand if we keep the syntax as it is now, we can introduce a comment block just by having its first line include a -- but not have it match the timestamp syntax, e.g. by having it be -- COMMENT or some such. Looking at the parser more closely, I don't really see how doing anything more complex than skipping the block entirely would be simpler than what we have now, anyway. Yes, I think that can work. The pattern of a line with -- without time markers is currently ignored, so we can introduce something with it for special content like comments, style and default. This seems to have been Ian's assumption, but it's not what the spec says. Follow the steps in http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#parsing-0 32. If line contains the three-character substring -- (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), then jump to the step labeled timings below. 40. Timings: Collect WebVTT cue timings and settings from line, using cue for the results. If that fails, jump to the step labeled bad cue. 54. Bad cue: Discard cue. (Followed by a loop to skip until the next empty line.) The effect is that that any line containing -- that is not a timing line causes everything up to the next newline to be ignored. * underline: EBU STL, CEA-608 and CEA-708 support underlining of characters. I've added support for 'text-decoration'. And for u. I am happy now, thanks. :-) Huh. For those who are surprised, this was added in http://html5.org/r/6004 at the same time as u was made conforming for HTML. See http://www.w3.org/Bugs/Public/show_bug.cgi?id=10838 * Voice synthesis of e.g. mixed English/French captions. Given that this would only be useful to be people who know both languages, it seem not worth complicating the format for. Agreed on all fronts. I disagree with the third case. Many people speak more than one language and even if they don't speak the language that is in use in a cue, it is still bad to render it in using the wrong language model, in particular if it is rendered by a screen reader. We really need a mechanism to attach a language marker to a cue segment. It's not needed for the rendering of French vs English, is it? It is theoretically useful for CJK, but as I've said before it seems to be more common to transliterate the foreign script in these cases. Do you have any examples of real-world subtitles/captions that would benefit from more fine-grained language information? This kind of information would indeed be useful. Note that I'm not so much worried about captions and subtitles here, but rather worried about audio descriptions as rendered from cue text descriptions. When would one want these descriptions to be multi-language? -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
Hi Philip, all, On Tue, Jun 7, 2011 at 8:12 PM, Philip Jägenstedt phil...@opera.com wrote: On Sat, 04 Jun 2011 17:05:55 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Mon, 3 Jan 2011, Philip J盲genstedt wrote: Silvia, is your mail client a bit funny with character encodings? (The UTF-8 representation of U+00E4 is the same as the GBK representation of U+76F2.) I'm using GMAIL, so if there is anything wrong, you'll have to report it to Google. ;-) Checking back, I actually received your name in Ian's email with that funny encoding. I'm not sure it's gmail's fault for interpreting it in this way or whether there was some information in email headers lost during delivery or what else. * The bad cue handling is stricter than it should be. After collecting an id, the next line must be a timestamp line. Otherwise, we skip everything until a blank line, so in the following the parser would jump to bad cue on line 2 and skip the whole cue. 1 2 00:00:00.000 -- 00:00:01.000 Bla This doesn't match what most existing SRT parsers do, as they simply look for timing lines and ignore everything else. If we really need to collect the id instead of ignoring it like everyone else, this should be more robust, so that a valid timing line always begins a new cue. Personally, I'd prefer if it is simply ignored and that we use some form of in-cue markup for styling hooks. The IDs are useful for referencing cues from script, so I haven't removed them. I've also left the parsing as is for when neither the first nor second line is a timing line, since that gives us a lot of headroom for future extensions (we can do anything so long as the second line doesn't start with a timestamp and -- and another timestamp). In the case of feeding future extensions to current parsers, it's way better fallback behavior to simply ignore the unrecognized second line than to discard the entire cue. The current behavior seems unnecessarily strict and makes the parser more complicated than it needs to be. My preference is just ignore anything preceding the timing line, but even if we must have IDs it can still be made simpler and more robust than what is currently spec'ed. If we just ignore content until we hit a line that happens to look like a timing line, then we are much more constrained in what we can do in the future. For example, we couldn't introduce a comment block syntax, since any comment containing a timing line wouldn't be ignored. On the other hand if we keep the syntax as it is now, we can introduce a comment block just by having its first line include a -- but not have it match the timestamp syntax, e.g. by having it be -- COMMENT or some such. Looking at the parser more closely, I don't really see how doing anything more complex than skipping the block entirely would be simpler than what we have now, anyway. Yes, I think that can work. The pattern of a line with -- without time markers is currently ignored, so we can introduce something with it for special content like comments, style and default. This seems to have been Ian's assumption, but it's not what the spec says. Follow the steps in http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#parsing-0 32. If line contains the three-character substring -- (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), then jump to the step labeled timings below. 40. Timings: Collect WebVTT cue timings and settings from line, using cue for the results. If that fails, jump to the step labeled bad cue. 54. Bad cue: Discard cue. (Followed by a loop to skip until the next empty line.) The effect is that that any line containing -- that is not a timing line causes everything up to the next newline to be ignored. Yes, that's what I expect. Therefore we can create such cues in the file format right now and the browsers as they currently work will ignore such content. In future, they can be extended to actually do something sensible with it. Isn't that what is currently ignored means? It doesn't break the parser - the parser just skips over it. Am I missing something? (And yes: I'd actually like to include these specs now rather than later, so we can extend the parsing algo right now. But I am not fussed about timing. It's good to understand how we will exend the format.) * Voice synthesis of e.g. mixed English/French captions. Given that this would only be useful to be people who know both languages, it seem not worth complicating the format for. Agreed on all fronts. I disagree with the third case. Many people speak more than one language and even if they don't speak the language that is in use in a cue, it is still bad to render it in using the wrong language model, in particular if it is rendered by a screen reader. We really need a mechanism to attach a language marker to a cue segment. It's not needed for the
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
On Sat, Jun 4, 2011 at 11:05 AM, Silvia Pfeiffer silviapfeiff...@gmail.comwrote: If we introduced the scrolling behaviour that I described above where cues that are rendered into the same location as a previous still active cue push that previous cue up, we get this behaviour covered too. I don't think so. This is a scene with two simultaneous conversations; in order to help make the subtitles readable, they were authored to keep one conversation pair always on top, and the other always on the bottom. Having captions move while they're already displayed wouldn't do this. (I think it'd make it unreadable, actually, by adding motion into the mix.) Eventually, we will want to get rid of the legacy format and just deliver WebVTT, but they still need to display as though they came from the original broadcast caption format for contractual reasons. I don't know what degree of sameness they expect, but as users can always override their font (implying different wrapping results, etc.), you'll never be able to guarantee that it'll look identical to the output of a more fixed format. If captions have editing like the above, it could even result in a visible drop in quality. -- Glenn Maynard
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
On Mon, Jun 6, 2011 at 5:30 PM, Glenn Maynard gl...@zewt.org wrote: On Sat, Jun 4, 2011 at 11:05 AM, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: If we introduced the scrolling behaviour that I described above where cues that are rendered into the same location as a previous still active cue push that previous cue up, we get this behaviour covered too. I don't think so. This is a scene with two simultaneous conversations; in order to help make the subtitles readable, they were authored to keep one conversation pair always on top, and the other always on the bottom. Having captions move while they're already displayed wouldn't do this. (I think it'd make it unreadable, actually, by adding motion into the mix.) If you use explicit L: placement, we could turn off the scrolling behaviour. I don't think your example is a typical one. In my (unmeasured) experience, the scrolling behaviour is much more typical. In fact, that example of yours is really really confusing to me. I would much prefer if the text wasn't displayed on top of each other, but at different locations on the screen - one to the right one to the left, preferably underneath the people that speak. That is a better experience. I believe that example of yours only looks that way because somebody had to work around the problem that the subtitle authoring format didn't allow for such explicit placement. Eventually, we will want to get rid of the legacy format and just deliver WebVTT, but they still need to display as though they came from the original broadcast caption format for contractual reasons. I don't know what degree of sameness they expect, but as users can always override their font (implying different wrapping results, etc.), you'll never be able to guarantee that it'll look identical to the output of a more fixed format. If captions have editing like the above, it could even result in a visible drop in quality. I think the opposite is true. Right now, people work around some of the ways in which they really would like to render their captions because the formats don't allow for example explicit placement. Therefore we get poor quality captions right now. With the features available, we should see better captions, not worse. Regards, Silvia.
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
On Mon, Jun 6, 2011 at 3:41 AM, Silvia Pfeiffer silviapfeiff...@gmail.comwrote: I don't think your example is a typical one. In my (unmeasured) experience, the scrolling behaviour is much more typical. It's definitely an uncommonly complex scene to caption. I raised it wondering about the quality level the format can manage in the harder cases. (To be clear, this isn't something any of the popular ad hoc formats can do, either--this was achieved with a brittle rendering-specific hacks.) I've never seen the scrolling behavior in subtitles, though. I think they're only common in live captions. In fact, that example of yours is really really confusing to me. I would much prefer if the text wasn't displayed on top of each other, but at different locations on the screen - one to the right one to the left, preferably underneath the people that speak. That is a better experience. I believe that example of yours only looks that way because somebody had to work around the problem that the subtitle authoring format didn't allow for such explicit placement. The scene is jumping all over the place--at one point one pair is directly *above* the other pair in the frame. There's no left/right speaker correspondance. FWIW, I find it intuitive to read. Eventually, we will want to get rid of the legacy format and just deliver WebVTT, but they still need to display as though they came from the original broadcast caption format for contractual reasons. I don't know what degree of sameness they expect, but as users can always override their font (implying different wrapping results, etc.), you'll never be able to guarantee that it'll look identical to the output of a more fixed format. If captions have editing like the above, it could even result in a visible drop in quality. I think the opposite is true. Right now, people work around some of the ways in which they really would like to render their captions because the formats don't allow for example explicit placement. Therefore we get poor quality captions right now. With the features available, we should see better captions, not worse. Sure (most of the time), but if the contracts you refer to require that their content rendered with WebVTT look the same as they did in their original format, that won't always be possible. -- Glenn Maynard
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
On Mon, Jun 6, 2011 at 6:04 PM, Glenn Maynard gl...@zewt.org wrote: On Mon, Jun 6, 2011 at 3:41 AM, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: I don't think your example is a typical one. In my (unmeasured) experience, the scrolling behaviour is much more typical. It's definitely an uncommonly complex scene to caption. I raised it wondering about the quality level the format can manage in the harder cases. (To be clear, this isn't something any of the popular ad hoc formats can do, either--this was achieved with a brittle rendering-specific hacks.) I've never seen the scrolling behavior in subtitles, though. I think they're only common in live captions. Agreed, mostly live, but also when playing back content that had live captions. Also, we do want to support live captions here, so I think they are relevant. In fact, that example of yours is really really confusing to me. I would much prefer if the text wasn't displayed on top of each other, but at different locations on the screen - one to the right one to the left, preferably underneath the people that speak. That is a better experience. I believe that example of yours only looks that way because somebody had to work around the problem that the subtitle authoring format didn't allow for such explicit placement. The scene is jumping all over the place--at one point one pair is directly *above* the other pair in the frame. There's no left/right speaker correspondance. FWIW, I find it intuitive to read. Even when they are above each other, you could place one on the left at the top next to the first speaker and one on the right at the bottom next to the other speaker. I guess intuitions can be different. :-) Eventually, we will want to get rid of the legacy format and just deliver WebVTT, but they still need to display as though they came from the original broadcast caption format for contractual reasons. I don't know what degree of sameness they expect, but as users can always override their font (implying different wrapping results, etc.), you'll never be able to guarantee that it'll look identical to the output of a more fixed format. If captions have editing like the above, it could even result in a visible drop in quality. I think the opposite is true. Right now, people work around some of the ways in which they really would like to render their captions because the formats don't allow for example explicit placement. Therefore we get poor quality captions right now. With the features available, we should see better captions, not worse. Sure (most of the time), but if the contracts you refer to require that their content rendered with WebVTT look the same as they did in their original format, that won't always be possible. Most things are actually possible - I've tried to give this a shot with CEA-608 captions here: http://www.w3.org/WAI/PF/HTML/wiki/Media_608_WebVTT_Conversion . Feedback very welcome! Cheers, Silvia.