Re: [whatwg] WebVTT feedback (was Re: Video feedback)

2011-08-08 Thread Silvia Pfeiffer
OK, I'll just keep adding feedback to this thread. This is feedback
from the Webkit team about implementing WebVTT support.


1. White space between cue settings

It seems that where we have specified how to parse the cue settings,
we only allow a single white space as separator between subsequent cue
settings:
http://www.whatwg.org/specs/web-apps/current-work/webvtt.html#parse-the-webvtt-settings

Thus, something like this is allowed: D:vertical A:middle
but not something like this: D:vertical A:middle.

I think we need to add a skip white space in step three.


2. cue order parsing

While the syntax spec says The time represented by this WebVTT
timestamp must be greater than or equal to the start time offsets of
all previous cues in the file. , there is no step in the parse that
will ascertain that cues that come our of time are dropped on the
floor. Do we need to include such a requirement before step 40 of the
parser?


Cheers,
Silvia.


Re: [whatwg] WebVTT feedback (was Re: Video feedback)

2011-08-06 Thread Silvia Pfeiffer
On Mon, Jun 27, 2011 at 6:07 PM, Silvia Pfeiffer
silviapfeiff...@gmail.com wrote:
 On Mon, Jun 27, 2011 at 5:34 PM, Anne van Kesteren ann...@opera.com wrote:
 On Mon, 27 Jun 2011 09:32:04 +0200, Silvia Pfeiffer
 silviapfeiff...@gmail.com wrote:

 Note that where his implementation differs from the spec, he has made
 a note. There are only two such notes. I'd like to see these
 addressed, too.

 Could you please post these to the list so that we not all have to read
 those documents?

 Good point. :-) (Just search for differs). Here they are - with some
 additional descriptions:

 1. [text track cue] size:
 this document differs from specs in that way that [text track cue] is
 as width (for horizontal, height for vertical) as the widest (for
 horizontal, highest for vertical) [text track cue line] within

 What Ronny says there is that in his implementation the default
 display size of the cue (i.e. the dark box that the cue is displayed
 in) is only as wide as the longest line in the cue (or high where
 we're dealing with vertical direction). Currently, the spec puts as a
 default S:100%.

 I personally also prefer this smaller default cue width because it
 covers less content of the video.


 2. Cue voice tag:
 this differs from specs in the way that opened v voice tags should
 be closed with /v

 Ronny's point is that the v element is expected to be closed,
 because it makes it easier to parse. So, instead of:

 00:01:07.395 -- 00:01:10.246
 v John DoHey!
 v Jane DoeHello!

 he expects:

 00:01:07.395 -- 00:01:10.246
 v John DoHey!/v
 v Jane DoeHello!/v

 I think the same is true for his implementation of the c class tags.


 Cheers,
 Silvia.



Adding one more to the list of things I've come across with the WebVTT spec.

I am right now trying to figure out how vertical  growing left cues
(i.e. cues with a cue rendering setting of D:vertical) are rendered.

If nothing else is set on the cue, my expectation would be that the
cue would be rendered on the right side of the video viewport, since
it's growing to the left.

As I follow through the algorithm at
http://www.whatwg.org/specs/web-apps/current-work/webvtt.html#webvtt-cue-text-rendering-rules
, I find that the default settings are:
* the text track cue line position default is auto,
* the snap-to-lines flag is true by default,
* block flow is left to right
and in step 9 we get:
If the text track cue writing direction is vertical growing left, and
the text track cue snap-to-lines flag is set, let x-position be zero.

I think this is incorrect and should be ..., let x-position be 100
so as to allow the text boxes to flow onto the video viewport from the
right boundary, rather than off its left border.


Note that step 9 for vertical growing right is correct:
If the text track cue writing direction is vertical growing right,
and the text track cue snap-to-lines flag is set, let x-position be
zero.
and the text should grow from the left boundary of the video viewport
to the right.

Cheers,
Silvia.


Re: [whatwg] WebVTT feedback (was Re: Video feedback)

2011-07-19 Thread Silvia Pfeiffer
Hi Marc,

On Wed, Jul 20, 2011 at 10:06 AM, Marc 'Tafouk' w...@millie.uk.to wrote:

 Hello folks,

 I've been following the latest developments on the WebVTT specification and
 am making an attempt to write an out-of-browser parser, using Anna
 Cavender's proposed patches to WebKit.

Cool! Is this a new video player app or going into, say, VLC or
something similar?


 First, I filed a request on the bugtracker
 http://www.w3.org/Bugs/Public/show_bug.cgi?id=13292 regarding the end-
 of-file marker that's mentioned in the current draft
 http://www.whatwg.org/specs/web-apps/current-work/#webvtt-cue-text-
 parsing-rules

I replied. IIUC, it's just the EOF state that is meant, not an actual character.


 I have another question about self-closing tags in cue text. It seems
 they're not supported at all.

None of the tags that we have mean anything if they self-close (and
the timestamp is implicitly closing).


 The U+002F SOLIDUS character (/) is only handled in the WebVTT tag state.

 Test case 1-a):
   WEBVTT

   00:00.000 -- 00:02.000
   Initial b/ test

 U+0062 (b) triggers WebVTT start tag state; U+002F is then handled as
 Anything else and is appended to result (tagname = b/).

Yes. The next character is then a  and causes in the next loop to
return an end tag. Then, end tags are parsed and it's not in the list
that we expect, so this happens: Otherwise, ignore the token. Thus,
b/ is ignored.


 Test case 1-b):
   WEBVTT

   00:00.000 -- 00:02.000
   Initial b / test


 U+0062 (b) triggers WebVTT start tag state; U+0020 (space) triggers
 WebVTT start tag annotation state; U+002F is handled as Anything else
 and is appended to buffer (annotation = /).

Once  is reached, this leads to a start tag b with an annotation
of /. From how I read it, the annotation string gets ignored.


 I am aware those may be moot atm because there is no void element AFAIK,
 and the current tags make no sense when immediately closed.

They still have to parse correctly. But I think from analysing the
spec they actually do.


 I also found a slight issue when following the parser specs : there is no
 validation of the class attribute.

http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#attach-a-webvtt-internal-node-object
says to attach the list of classes to the element. Right now, all
characters are allowed for class names bar space, tab, . and . It
might indeed be an idea to restrict these character to those allowed
for class names in HTML.


 Test case 2):
   WEBVTT

   00:00.000 -- 00:02.000
   Second c.. [my annotation] test

 classes is a list of 10 empty strings.

While possibly a bit or unneeded overhead, in
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#webvtt-cue-text-dom-construction-rules
when mapping to HTML happens, they just create an additional space in
the class attribute, so are not harmful.


Cheers,
Silvia.


Re: [whatwg] WebVTT feedback (was Re: Video feedback)

2011-06-27 Thread Anne van Kesteren
On Mon, 27 Jun 2011 09:32:04 +0200, Silvia Pfeiffer  
silviapfeiff...@gmail.com wrote:

Note that where his implementation differs from the spec, he has made
a note. There are only two such notes. I'd like to see these
addressed, too.


Could you please post these to the list so that we not all have to read  
those documents?


Thanks,


--
Anne van Kesteren
http://annevankesteren.nl/


Re: [whatwg] WebVTT feedback (was Re: Video feedback)

2011-06-27 Thread Silvia Pfeiffer
On Mon, Jun 27, 2011 at 5:34 PM, Anne van Kesteren ann...@opera.com wrote:
 On Mon, 27 Jun 2011 09:32:04 +0200, Silvia Pfeiffer
 silviapfeiff...@gmail.com wrote:

 Note that where his implementation differs from the spec, he has made
 a note. There are only two such notes. I'd like to see these
 addressed, too.

 Could you please post these to the list so that we not all have to read
 those documents?

Good point. :-) (Just search for differs). Here they are - with some
additional descriptions:

1. [text track cue] size:
this document differs from specs in that way that [text track cue] is
as width (for horizontal, height for vertical) as the widest (for
horizontal, highest for vertical) [text track cue line] within

What Ronny says there is that in his implementation the default
display size of the cue (i.e. the dark box that the cue is displayed
in) is only as wide as the longest line in the cue (or high where
we're dealing with vertical direction). Currently, the spec puts as a
default S:100%.

I personally also prefer this smaller default cue width because it
covers less content of the video.


2. Cue voice tag:
this differs from specs in the way that opened v voice tags should
be closed with /v

Ronny's point is that the v element is expected to be closed,
because it makes it easier to parse. So, instead of:

00:01:07.395 -- 00:01:10.246
v John DoHey!
v Jane DoeHello!

he expects:

00:01:07.395 -- 00:01:10.246
v John DoHey!/v
v Jane DoeHello!/v

I think the same is true for his implementation of the c class tags.


Cheers,
Silvia.


Re: [whatwg] WebVTT feedback (was Re: Video feedback)

2011-06-09 Thread Silvia Pfeiffer
Hi all,

While we're on the topic of providing feedback on WebVTT, I want to
add some things that have crept up while trying to implement the spec
line by line.

http://www.whatwg.org/specs/web-apps/current-work/webvtt.html

1. Text Track cue size

In the parsing section for cues, step 27, the default for cue is set
to 100. This means that every cue that has no explicit size setting
(S:) will occupy the full width of the video viewport (height if
vertical renering), even if the displayed text is only short, such as
[music]. I believe that is not the best default means of rendering
subtitles and captions, because more of the video's pixel are
obstructed than is necessary by the cue background box with its dark
grey background rgba(0,0,0,0.8).

Instead, it would make a lot more sense to just have a background box
cover the screen estate that the text needs, i.e. put the background
color only on the Text boxes themselves. This is how YouTube do it.

Alternatively, we could have the background box just cover the
bounding box of all the Text boxes inside it, which will make it a
rectangular display of each caption cue, but bound to the width of the
longest text line length.


2. Text Track default cue line position

In the parsing section for cues, step 25, the default line position
for cues is 'auto' and the default snap-to-lines flag is true. For
cues that have no explicit line position setting (L:), this means
that the height of the cue ends up getting y-position of 0 (see
Section 2 with the WebVTT cue text rendering rules, step 10, substep
9, first case ). The y-position in turn leads in substep 10 to setting
the top property to y-position vh, which is 0 percent of the video's
height. top:0 means that the cue is now placed by default at the top
of the video viewport.

Instead, it would make a lot more sense to have it rendered by default
at the bottom of the video viewport, since that is how captions and
subtitles in the past have by default been rendered.

Thus, I would suggest to mean that an auto line position is mapped to
the y-position of 100 in Section 2, step 10, substep 9, first case.


3. Calculation of Text Track cue line position

Assuming we've set a L:100% on a cue, then according to Section 2,
step 10, substep 9, second case we arrive at a y-position of 100,
leading to the setting of top to 100% of the video's height. This
means that the cue will disappear beyond the bottom of the video
viewport. Is that intended?

Also, shouldn't the caption text box have been centered in the middle
of the caption text box's height at the L position rather than at the
top of that box?


4. Calculation of Text Track cue text position

Similarly as for the vertical line positioning, I wonder whether there
is a problem with the horizontal T: text positioning. When we
specify T:25% on an A:middle cue box, the box is moved half its size
to the left of the T position, i.e. it ends up at -12.5% of the video
viewport's width. Is that intended? Should there be a way to limit how
far a box can be moved off the video viewport? Should it continue to
be visible when moved off the video viewport?


Cheers,
Silvia.
(and thanks to Ronny for helping to surface some of this)


Re: [whatwg] WebVTT feedback (was Re: Video feedback)

2011-06-08 Thread Philip Jägenstedt
On Wed, 08 Jun 2011 02:54:45 +0200, Silvia Pfeiffer  
silviapfeiff...@gmail.com wrote:



Hi Philip, all,

On Tue, Jun 7, 2011 at 8:12 PM, Philip Jägenstedt phil...@opera.com  
wrote:

On Sat, 04 Jun 2011 17:05:55 +0200, Silvia Pfeiffer
silviapfeiff...@gmail.com wrote:


On Mon, 3 Jan 2011, Philip J盲genstedt wrote:


Silvia, is your mail client a bit funny with character encodings? (The  
UTF-8
representation of U+00E4 is the same as the GBK representation of  
U+76F2.)


I'm using GMAIL, so if there is anything wrong, you'll have to report
it to Google. ;-)
Checking back, I actually received your name in Ian's email with that
funny encoding. I'm not sure it's gmail's fault for interpreting it in
this way or whether there was some information in email headers lost
during delivery or what else.



  * The bad cue handling is stricter than it should be. After
  collecting an id, the next line must be a timestamp line.  
Otherwise,

  we skip everything until a blank line, so in the following the
  parser would jump to bad cue on line 2 and skip the whole  
cue.

 
  1
  2
  00:00:00.000 -- 00:00:01.000
  Bla
 
  This doesn't match what most existing SRT parsers do, as they  
simply
  look for timing lines and ignore everything else. If we really  
need

  to collect the id instead of ignoring it like everyone else, this
  should be more robust, so that a valid timing line always begins  
a
  new cue. Personally, I'd prefer if it is simply ignored and that  
we

  use some form of in-cue markup for styling hooks.

 The IDs are useful for referencing cues from script, so I haven't
 removed them. I've also left the parsing as is for when neither the
 first nor second line is a timing line, since that gives us a lot  
of

 headroom for future extensions (we can do anything so long as the
 second line doesn't start with a timestamp and -- and another
 timestamp).

In the case of feeding future extensions to current parsers, it's way
better fallback behavior to simply ignore the unrecognized second  
line
than to discard the entire cue. The current behavior seems  
unnecessarily

strict and makes the parser more complicated than it needs to be. My
preference is just ignore anything preceding the timing line, but  
even

if we must have IDs it can still be made simpler and more robust than
what is currently spec'ed.


If we just ignore content until we hit a line that happens to look  
like a
timing line, then we are much more constrained in what we can do in  
the

future. For example, we couldn't introduce a comment block syntax,
since
any comment containing a timing line wouldn't be ignored. On the other
hand if we keep the syntax as it is now, we can introduce a comment  
block
just by having its first line include a -- but not have it match  
the

timestamp syntax, e.g. by having it be -- COMMENT or some such.

Looking at the parser more closely, I don't really see how doing  
anything
more complex than skipping the block entirely would be simpler than  
what

we have now, anyway.


Yes, I think that can work. The pattern of a line with -- without
time markers is currently ignored, so we can introduce something with
it for special content like comments, style and default.


This seems to have been Ian's assumption, but it's not what the spec  
says.

Follow the steps in
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#parsing-0

32. If line contains the three-character substring -- (U+002D
HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), then jump  
to

the step labeled timings below.

40. Timings: Collect WebVTT cue timings and settings from line, using  
cue

for the results. If that fails, jump to the step labeled bad cue.

54. Bad cue: Discard cue.

(Followed by a loop to skip until the next empty line.)

The effect is that that any line containing -- that is not a timing  
line

causes everything up to the next newline to be ignored.



Yes, that's what I expect. Therefore we can create such cues in the
file format right now and the browsers as they currently work will
ignore such content. In future, they can be extended to actually do
something sensible with it. Isn't that what is currently ignored
means? It doesn't break the parser - the parser just skips over it. Am
I missing something?


OK, I guess we're talking about slightly different things. It is possible  
to add a syntax to comment out entire cues using something with --, so  
if that's all we want, that's fine.


* Voice synthesis of e.g. mixed English/French captions. Given that  
this
would only be useful to be people who know both languages, it seem  
not

worth complicating the format for.


Agreed on all fronts.


I disagree with the third case. Many people speak more than one
language and even if they don't speak the language that is in use in a
cue, it is still bad to render it in using the wrong language model,
in particular if it is rendered by a screen reader. We really need a

Re: [whatwg] WebVTT feedback (was Re: Video feedback)

2011-06-08 Thread Silvia Pfeiffer
On Wed, Jun 8, 2011 at 6:39 PM, Philip Jägenstedt phil...@opera.com wrote:
 On Wed, 08 Jun 2011 02:54:45 +0200, Silvia Pfeiffer
 silviapfeiff...@gmail.com wrote:

 Hi Philip, all,

 On Tue, Jun 7, 2011 at 8:12 PM, Philip Jägenstedt phil...@opera.com
 wrote:

 On Sat, 04 Jun 2011 17:05:55 +0200, Silvia Pfeiffer
 silviapfeiff...@gmail.com wrote:

 On Mon, 3 Jan 2011, Philip J盲genstedt wrote:

 Silvia, is your mail client a bit funny with character encodings? (The
 UTF-8
 representation of U+00E4 is the same as the GBK representation of
 U+76F2.)

 I'm using GMAIL, so if there is anything wrong, you'll have to report
 it to Google. ;-)
 Checking back, I actually received your name in Ian's email with that
 funny encoding. I'm not sure it's gmail's fault for interpreting it in
 this way or whether there was some information in email headers lost
 during delivery or what else.


   * The bad cue handling is stricter than it should be. After
   collecting an id, the next line must be a timestamp line.
   Otherwise,
   we skip everything until a blank line, so in the following the
   parser would jump to bad cue on line 2 and skip the whole cue.
  
   1
   2
   00:00:00.000 -- 00:00:01.000
   Bla
  
   This doesn't match what most existing SRT parsers do, as they
   simply
   look for timing lines and ignore everything else. If we really
   need
   to collect the id instead of ignoring it like everyone else, this
   should be more robust, so that a valid timing line always begins a
   new cue. Personally, I'd prefer if it is simply ignored and that
   we
   use some form of in-cue markup for styling hooks.
 
  The IDs are useful for referencing cues from script, so I haven't
  removed them. I've also left the parsing as is for when neither the
  first nor second line is a timing line, since that gives us a lot of
  headroom for future extensions (we can do anything so long as the
  second line doesn't start with a timestamp and -- and another
  timestamp).

 In the case of feeding future extensions to current parsers, it's way
 better fallback behavior to simply ignore the unrecognized second line
 than to discard the entire cue. The current behavior seems
 unnecessarily
 strict and makes the parser more complicated than it needs to be. My
 preference is just ignore anything preceding the timing line, but even
 if we must have IDs it can still be made simpler and more robust than
 what is currently spec'ed.

 If we just ignore content until we hit a line that happens to look like
 a
 timing line, then we are much more constrained in what we can do in the
 future. For example, we couldn't introduce a comment block syntax,
 since
 any comment containing a timing line wouldn't be ignored. On the other
 hand if we keep the syntax as it is now, we can introduce a comment
 block
 just by having its first line include a -- but not have it match the
 timestamp syntax, e.g. by having it be -- COMMENT or some such.

 Looking at the parser more closely, I don't really see how doing
 anything
 more complex than skipping the block entirely would be simpler than
 what
 we have now, anyway.

 Yes, I think that can work. The pattern of a line with -- without
 time markers is currently ignored, so we can introduce something with
 it for special content like comments, style and default.

 This seems to have been Ian's assumption, but it's not what the spec
 says.
 Follow the steps in

 http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#parsing-0

 32. If line contains the three-character substring -- (U+002D
 HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), then jump
 to
 the step labeled timings below.

 40. Timings: Collect WebVTT cue timings and settings from line, using cue
 for the results. If that fails, jump to the step labeled bad cue.

 54. Bad cue: Discard cue.

 (Followed by a loop to skip until the next empty line.)

 The effect is that that any line containing -- that is not a timing
 line
 causes everything up to the next newline to be ignored.


 Yes, that's what I expect. Therefore we can create such cues in the
 file format right now and the browsers as they currently work will
 ignore such content. In future, they can be extended to actually do
 something sensible with it. Isn't that what is currently ignored
 means? It doesn't break the parser - the parser just skips over it. Am
 I missing something?

 OK, I guess we're talking about slightly different things. It is possible to
 add a syntax to comment out entire cues using something with --, so if
 that's all we want, that's fine.

 * Voice synthesis of e.g. mixed English/French captions. Given that
 this
 would only be useful to be people who know both languages, it seem not
 worth complicating the format for.

 Agreed on all fronts.

 I disagree with the third case. Many people speak more than one
 language and even if they don't speak the language that is in use in a
 cue, it is still bad to render it in using 

Re: [whatwg] WebVTT feedback (was Re: Video feedback)

2011-06-07 Thread Philip Jägenstedt
On Sat, 04 Jun 2011 17:05:55 +0200, Silvia Pfeiffer  
silviapfeiff...@gmail.com wrote:



On Mon, 3 Jan 2011, Philip J盲genstedt wrote:


Silvia, is your mail client a bit funny with character encodings? (The  
UTF-8 representation of U+00E4 is the same as the GBK representation of  
U+76F2.)



  * The bad cue handling is stricter than it should be. After
  collecting an id, the next line must be a timestamp line.  
Otherwise,

  we skip everything until a blank line, so in the following the
  parser would jump to bad cue on line 2 and skip the whole cue.
 
  1
  2
  00:00:00.000 -- 00:00:01.000
  Bla
 
  This doesn't match what most existing SRT parsers do, as they  
simply

  look for timing lines and ignore everything else. If we really need
  to collect the id instead of ignoring it like everyone else, this
  should be more robust, so that a valid timing line always begins a
  new cue. Personally, I'd prefer if it is simply ignored and that we
  use some form of in-cue markup for styling hooks.

 The IDs are useful for referencing cues from script, so I haven't
 removed them. I've also left the parsing as is for when neither the
 first nor second line is a timing line, since that gives us a lot of
 headroom for future extensions (we can do anything so long as the
 second line doesn't start with a timestamp and -- and another
 timestamp).

In the case of feeding future extensions to current parsers, it's way
better fallback behavior to simply ignore the unrecognized second line
than to discard the entire cue. The current behavior seems  
unnecessarily

strict and makes the parser more complicated than it needs to be. My
preference is just ignore anything preceding the timing line, but even
if we must have IDs it can still be made simpler and more robust than
what is currently spec'ed.


If we just ignore content until we hit a line that happens to look like  
a

timing line, then we are much more constrained in what we can do in the
future. For example, we couldn't introduce a comment block syntax,  
since

any comment containing a timing line wouldn't be ignored. On the other
hand if we keep the syntax as it is now, we can introduce a comment  
block

just by having its first line include a -- but not have it match the
timestamp syntax, e.g. by having it be -- COMMENT or some such.

Looking at the parser more closely, I don't really see how doing  
anything

more complex than skipping the block entirely would be simpler than what
we have now, anyway.


Yes, I think that can work. The pattern of a line with -- without
time markers is currently ignored, so we can introduce something with
it for special content like comments, style and default.


This seems to have been Ian's assumption, but it's not what the spec says.  
Follow the steps in  
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#parsing-0


32. If line contains the three-character substring -- (U+002D  
HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), then jump to  
the step labeled timings below.


40. Timings: Collect WebVTT cue timings and settings from line, using cue  
for the results. If that fails, jump to the step labeled bad cue.


54. Bad cue: Discard cue.

(Followed by a loop to skip until the next empty line.)

The effect is that that any line containing -- that is not a timing  
line causes everything up to the next newline to be ignored.





* underline: EBU STL, CEA-608 and CEA-708 support underlining of
characters.


I've added support for 'text-decoration'.


And for u. I am happy now, thanks. :-)


Huh. For those who are surprised, this was added in  
http://html5.org/r/6004 at the same time as u was made conforming for  
HTML. See http://www.w3.org/Bugs/Public/show_bug.cgi?id=10838




* Voice synthesis of e.g. mixed English/French captions. Given that  
this

would only be useful to be people who know both languages, it seem not
worth complicating the format for.


Agreed on all fronts.


I disagree with the third case. Many people speak more than one
language and even if they don't speak the language that is in use in a
cue, it is still bad to render it in using the wrong language model,
in particular if it is rendered by a screen reader. We really need a
mechanism to attach a language marker to a cue segment.


It's not needed for the rendering of French vs English, is it? It is  
theoretically useful for CJK, but as I've said before it seems to be more  
common to transliterate the foreign script in these cases.



Do you have any examples of real-world subtitles/captions that would
benefit from more fine-grained language information?


This kind of information would indeed be useful.


Note that I'm not so much worried about captions and subtitles here,
but rather worried about audio descriptions as rendered from cue text
descriptions.


When would one want these descriptions to be multi-language?

--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] WebVTT feedback (was Re: Video feedback)

2011-06-07 Thread Silvia Pfeiffer
Hi Philip, all,

On Tue, Jun 7, 2011 at 8:12 PM, Philip Jägenstedt phil...@opera.com wrote:
 On Sat, 04 Jun 2011 17:05:55 +0200, Silvia Pfeiffer
 silviapfeiff...@gmail.com wrote:

 On Mon, 3 Jan 2011, Philip J盲genstedt wrote:

 Silvia, is your mail client a bit funny with character encodings? (The UTF-8
 representation of U+00E4 is the same as the GBK representation of U+76F2.)

I'm using GMAIL, so if there is anything wrong, you'll have to report
it to Google. ;-)
Checking back, I actually received your name in Ian's email with that
funny encoding. I'm not sure it's gmail's fault for interpreting it in
this way or whether there was some information in email headers lost
during delivery or what else.


   * The bad cue handling is stricter than it should be. After
   collecting an id, the next line must be a timestamp line. Otherwise,
   we skip everything until a blank line, so in the following the
   parser would jump to bad cue on line 2 and skip the whole cue.
  
   1
   2
   00:00:00.000 -- 00:00:01.000
   Bla
  
   This doesn't match what most existing SRT parsers do, as they simply
   look for timing lines and ignore everything else. If we really need
   to collect the id instead of ignoring it like everyone else, this
   should be more robust, so that a valid timing line always begins a
   new cue. Personally, I'd prefer if it is simply ignored and that we
   use some form of in-cue markup for styling hooks.
 
  The IDs are useful for referencing cues from script, so I haven't
  removed them. I've also left the parsing as is for when neither the
  first nor second line is a timing line, since that gives us a lot of
  headroom for future extensions (we can do anything so long as the
  second line doesn't start with a timestamp and -- and another
  timestamp).

 In the case of feeding future extensions to current parsers, it's way
 better fallback behavior to simply ignore the unrecognized second line
 than to discard the entire cue. The current behavior seems unnecessarily
 strict and makes the parser more complicated than it needs to be. My
 preference is just ignore anything preceding the timing line, but even
 if we must have IDs it can still be made simpler and more robust than
 what is currently spec'ed.

 If we just ignore content until we hit a line that happens to look like a
 timing line, then we are much more constrained in what we can do in the
 future. For example, we couldn't introduce a comment block syntax,
 since
 any comment containing a timing line wouldn't be ignored. On the other
 hand if we keep the syntax as it is now, we can introduce a comment block
 just by having its first line include a -- but not have it match the
 timestamp syntax, e.g. by having it be -- COMMENT or some such.

 Looking at the parser more closely, I don't really see how doing anything
 more complex than skipping the block entirely would be simpler than what
 we have now, anyway.

 Yes, I think that can work. The pattern of a line with -- without
 time markers is currently ignored, so we can introduce something with
 it for special content like comments, style and default.

 This seems to have been Ian's assumption, but it's not what the spec says.
 Follow the steps in
 http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#parsing-0

 32. If line contains the three-character substring -- (U+002D
 HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), then jump to
 the step labeled timings below.

 40. Timings: Collect WebVTT cue timings and settings from line, using cue
 for the results. If that fails, jump to the step labeled bad cue.

 54. Bad cue: Discard cue.

 (Followed by a loop to skip until the next empty line.)

 The effect is that that any line containing -- that is not a timing line
 causes everything up to the next newline to be ignored.


Yes, that's what I expect. Therefore we can create such cues in the
file format right now and the browsers as they currently work will
ignore such content. In future, they can be extended to actually do
something sensible with it. Isn't that what is currently ignored
means? It doesn't break the parser - the parser just skips over it. Am
I missing something?

(And yes: I'd actually like to include these specs now rather than
later, so we can extend the parsing algo right now. But I am not
fussed about timing. It's good to understand how we will exend the
format.)


 * Voice synthesis of e.g. mixed English/French captions. Given that this
 would only be useful to be people who know both languages, it seem not
 worth complicating the format for.

 Agreed on all fronts.

 I disagree with the third case. Many people speak more than one
 language and even if they don't speak the language that is in use in a
 cue, it is still bad to render it in using the wrong language model,
 in particular if it is rendered by a screen reader. We really need a
 mechanism to attach a language marker to a cue segment.

 It's not needed for the 

[whatwg] WebVTT feedback (was Re: Video feedback)

2011-06-06 Thread Silvia Pfeiffer
Hi Ian, all,

I am very excited by the possibilities that Ian outlined for WebVTT
and how we can add V2 features.

I have some comments on the discussion below, but first I'd like to
point people to a piece of work that Ronny Mennerich from
LeanbackPlayer has recently undertaken (with a little of my help).
Ronny has create this Web page:
http://leanbackplayer.com/other/webvtt.html . It summarizes the WebVTT
file format and provides visual clarifications on how the cue settings
work.

I would like to point out that Ronny has done the drawings according
to how we understand the WebVTT / HTML spec, so would appreciate
somebody checking if it's correct.

I would also like to point out the issues that Ronny lists on the
bottom of that page and that we need to resolve. I've copied them here
for discussion and added some more detail:

* A:[start|middle|end]
 -- If the [subtitle box] and also the [subtitle text] are aligned by
the designer within a CSS (file), which setting dominates: CSS or cue
setting, for both [subtitle box] and [subtitle text]?

 -- As it is text alignment, for me it is alignment of text within the
[subtitle text] element only, but not also alignment/positioning of
[subtitle text] element in relation to the [subtitle box]! However,
Silvia reckons the anchoring of the box changes with the alignment, so
that it is possible to actually middle align the [subtitle box] with
A:middle. We wonder which understanding is correct.


* T:[number]%
 -- If the [subtitle box] and also the [subtitle text] are aligned by
the designer within a CSS (file), which setting dominates: CSS or cue
setting, for both [subtitle box] and [subtitle text]?

-- What about it if T is used together with A:[start|middle|end]?


* S:[number]
 -- If using S:[number] without % (percentage) it is not clear
whether px or em is the unit for the text size.

 -- If using em as unit it has to be cleared how to set and
calculate the text size value! Because there is no real value, only
integer, for [number] we can not make S:1.2 so we need a note for it
like e.g. S:120 is an example value, than the text size has to be
text-size: (120/100)em;
If using px as unit it is easy, no calculation needed, [number]
could be the new text size! If e.g. S:12 is an example value, than the
text size has to be text-size: 12px;

* cue voice tag
 -- why are we not using voice name declaration like in the cue class
tags with a dot separation like v.VoiceNamevoice text/v and
without spaces (eg. v VoiceName). This could avoid errors by .vtt
file writer and would also be much more clear to implement.

Please keep Ronny in the CC when you answer, because he is not
subscribed to the list.


Now my feedback on the WebVTT that Ian's Video feedback email provided:

On Fri, Jun 3, 2011 at 9:28 AM, Ian Hickson i...@hixie.ch wrote:
 On Mon, 3 Jan 2011, Philip J盲genstedt wrote:
 
  + I've added a magic string that is required on the format to make it
    recognisable in environments with no or unreliable type labeling.

 Is there a reason it's WEBVTT FILE instead of just WEBVTT? FILE
 seems redundant and like unnecessary typing to me.

 It seemed more likely that non-WebVTT files would start with a line that
 said just WEBVTT than a line that said just WEBVTT FILE. But I guess
 WEBVTT FILE FORMAT is just as likely and it'll be caught.

 I've changed it to just WEBVTT; there may be existing implementations
 that only accept WEBVTT FILE so for now I recommend that authors still
 use the longer header.

I'll tweet the changes to help spread the news. I like it this short. :-)


  On Wed, 8 Sep 2010, Philip J盲genstedt wrote:
  
   In the discussion on public-html-a11y trackgroup was suggested to
   group together mutually exclusive tracks, so that enabling one
   automatically disables the others in the same trackgroup.
  
   I guess it's up to the UA how to enable and disable tracks now,
   but the only option is making them all mutually exclusive (as
   existing players do) or a weird kind of context menu where it's
   possible to enable and disable tracks completely independently.
   Neither options is great, but as a user I would almost certainly
   prefer all tracks being mutually exclusive and requiring scripts to
   enable several at once.
 
  It's not clear to me what the use case is for having multiple groups
  of mutually exclusive tracks.
 
  The intent of the spec as written was that a browser would by default
  just have a list of all the subtitle and caption tracks (the latter
  with suitable icons next to them, e.g. the [CC] icon in US locales),
  and the user would pick one (or none) from the list. One could easily
  imagine a UA allowing the user to enable multiple tracks by having the
  user ctrl-click a menu item, though, or some similar solution, much
  like with the commonly seen select box UI.

 In the vast majority of cases, all tracks are intended to be mutually
 exclusive, such as English+English HoH or subtitles in different
 languages. No media 

Re: [whatwg] WebVTT feedback (was Re: Video feedback)

2011-06-06 Thread Glenn Maynard
On Sat, Jun 4, 2011 at 11:05 AM, Silvia Pfeiffer
silviapfeiff...@gmail.comwrote:

 If we introduced the scrolling behaviour that I described above where
 cues that are rendered into the same location as a previous still
 active cue push that previous cue up, we get this behaviour covered
 too.


I don't think so.  This is a scene with two simultaneous conversations; in
order to help make the subtitles readable, they were authored to keep one
conversation pair always on top, and the other always on the bottom.  Having
captions move while they're already displayed wouldn't do this.  (I think
it'd make it unreadable, actually, by adding motion into the mix.)

Eventually, we will want to get rid of the legacy format and just
 deliver WebVTT, but they still need to display as though they came
 from the original broadcast caption format for contractual reasons.


I don't know what degree of sameness they expect, but as users can always
override their font (implying different wrapping results, etc.), you'll
never be able to guarantee that it'll look identical to the output of a more
fixed format.  If captions have editing like the above, it could even result
in a visible drop in quality.

-- 
Glenn Maynard


Re: [whatwg] WebVTT feedback (was Re: Video feedback)

2011-06-06 Thread Silvia Pfeiffer
On Mon, Jun 6, 2011 at 5:30 PM, Glenn Maynard gl...@zewt.org wrote:
 On Sat, Jun 4, 2011 at 11:05 AM, Silvia Pfeiffer silviapfeiff...@gmail.com
 wrote:

 If we introduced the scrolling behaviour that I described above where
 cues that are rendered into the same location as a previous still
 active cue push that previous cue up, we get this behaviour covered
 too.

 I don't think so.  This is a scene with two simultaneous conversations; in
 order to help make the subtitles readable, they were authored to keep one
 conversation pair always on top, and the other always on the bottom.  Having
 captions move while they're already displayed wouldn't do this.  (I think
 it'd make it unreadable, actually, by adding motion into the mix.)

If you use explicit L: placement, we could turn off the scrolling behaviour.

I don't think your example is a typical one. In my (unmeasured)
experience, the scrolling behaviour is much more typical.

In fact, that example of yours is really really confusing to me. I
would much prefer if the text wasn't displayed on top of each other,
but at different locations on the screen - one to the right one to the
left, preferably underneath the people that speak. That is a better
experience. I believe that example of yours only looks that way
because somebody had to work around the problem that the subtitle
authoring format didn't allow for such explicit placement.


 Eventually, we will want to get rid of the legacy format and just
 deliver WebVTT, but they still need to display as though they came
 from the original broadcast caption format for contractual reasons.

 I don't know what degree of sameness they expect, but as users can always
 override their font (implying different wrapping results, etc.), you'll
 never be able to guarantee that it'll look identical to the output of a more
 fixed format.  If captions have editing like the above, it could even result
 in a visible drop in quality.

I think the opposite is true. Right now, people work around some of
the ways in which they really would like to render their captions
because the formats don't allow for example explicit placement.
Therefore we get poor quality captions right now. With the features
available, we should see better captions, not worse.

Regards,
Silvia.


Re: [whatwg] WebVTT feedback (was Re: Video feedback)

2011-06-06 Thread Glenn Maynard
On Mon, Jun 6, 2011 at 3:41 AM, Silvia Pfeiffer
silviapfeiff...@gmail.comwrote:

 I don't think your example is a typical one. In my (unmeasured)
 experience, the scrolling behaviour is much more typical.


It's definitely an uncommonly complex scene to caption.  I raised it
wondering about the quality level the format can manage in the harder
cases.  (To be clear, this isn't something any of the popular ad hoc formats
can do, either--this was achieved with a brittle rendering-specific hacks.)

I've never seen the scrolling behavior in subtitles, though.  I think
they're only common in live captions.

In fact, that example of yours is really really confusing to me. I
 would much prefer if the text wasn't displayed on top of each other,
 but at different locations on the screen - one to the right one to the
 left, preferably underneath the people that speak. That is a better
 experience. I believe that example of yours only looks that way
 because somebody had to work around the problem that the subtitle
 authoring format didn't allow for such explicit placement.


The scene is jumping all over the place--at one point one pair is directly
*above* the other pair in the frame.  There's no left/right speaker
correspondance.  FWIW, I find it intuitive to read.

 Eventually, we will want to get rid of the legacy format and just
  deliver WebVTT, but they still need to display as though they came
  from the original broadcast caption format for contractual reasons.
 
  I don't know what degree of sameness they expect, but as users can always
  override their font (implying different wrapping results, etc.), you'll
  never be able to guarantee that it'll look identical to the output of a
 more
  fixed format.  If captions have editing like the above, it could even
 result
  in a visible drop in quality.

 I think the opposite is true. Right now, people work around some of
 the ways in which they really would like to render their captions
 because the formats don't allow for example explicit placement.
 Therefore we get poor quality captions right now. With the features
 available, we should see better captions, not worse.


Sure (most of the time), but if the contracts you refer to require that
their content rendered with WebVTT look the same as they did in their
original format, that won't always be possible.

-- 
Glenn Maynard


Re: [whatwg] WebVTT feedback (was Re: Video feedback)

2011-06-06 Thread Silvia Pfeiffer
On Mon, Jun 6, 2011 at 6:04 PM, Glenn Maynard gl...@zewt.org wrote:
 On Mon, Jun 6, 2011 at 3:41 AM, Silvia Pfeiffer silviapfeiff...@gmail.com
 wrote:

 I don't think your example is a typical one. In my (unmeasured)
 experience, the scrolling behaviour is much more typical.

 It's definitely an uncommonly complex scene to caption.  I raised it
 wondering about the quality level the format can manage in the harder
 cases.  (To be clear, this isn't something any of the popular ad hoc formats
 can do, either--this was achieved with a brittle rendering-specific hacks.)

 I've never seen the scrolling behavior in subtitles, though.  I think
 they're only common in live captions.

Agreed, mostly live, but also when playing back content that had live
captions. Also, we do want to support live captions here, so I think
they are relevant.


 In fact, that example of yours is really really confusing to me. I
 would much prefer if the text wasn't displayed on top of each other,
 but at different locations on the screen - one to the right one to the
 left, preferably underneath the people that speak. That is a better
 experience. I believe that example of yours only looks that way
 because somebody had to work around the problem that the subtitle
 authoring format didn't allow for such explicit placement.

 The scene is jumping all over the place--at one point one pair is directly
 *above* the other pair in the frame.  There's no left/right speaker
 correspondance.  FWIW, I find it intuitive to read.

Even when they are above each other, you could place one on the left
at the top next to the first speaker and one on the right at the
bottom next to the other speaker.

I guess intuitions can be different. :-)


  Eventually, we will want to get rid of the legacy format and just
  deliver WebVTT, but they still need to display as though they came
  from the original broadcast caption format for contractual reasons.
 
  I don't know what degree of sameness they expect, but as users can
  always
  override their font (implying different wrapping results, etc.), you'll
  never be able to guarantee that it'll look identical to the output of a
  more
  fixed format.  If captions have editing like the above, it could even
  result
  in a visible drop in quality.

 I think the opposite is true. Right now, people work around some of
 the ways in which they really would like to render their captions
 because the formats don't allow for example explicit placement.
 Therefore we get poor quality captions right now. With the features
 available, we should see better captions, not worse.

 Sure (most of the time), but if the contracts you refer to require that
 their content rendered with WebVTT look the same as they did in their
 original format, that won't always be possible.

Most things are actually possible - I've tried to give this a shot
with CEA-608 captions here:
http://www.w3.org/WAI/PF/HTML/wiki/Media_608_WebVTT_Conversion .
Feedback very welcome!

Cheers,
Silvia.