On Fri, 03 Jun 2011 01:28:45 +0200, Ian Hickson <i...@hixie.ch> wrote:

> On Fri, 22 Oct 2010, Simon Pieters wrote:

Actually it was me, but that's OK :)

> > There was also some discussion about metadata. Language is sometimes
> > necessary for the font engine to pick the right glyph.
>
> Could you elaborate on this? My assumption was that we'd just use CSS,
> which doesn't rely on language for this.

It's not in any spec that I'm aware of, but some browsers (including
Opera) pick different glyphs depending on the language of the text,
which really helps when rendering CJK when you have several CJK fonts on
the system. Browsers will already know the language from <track
srclang>, so this would be for external players.

How is this problem solved in SRT players today?

Not at all, it seems. Both VLC and Totem allow setting the character encoding and font used for subtitles in the (global) preferences menu, so presumably you would change that if the default doesn't work. Font switching seems to mainly be an issue when your system has other default fonts than the text you're reading, and it appears that is rare enough that very little software does anything about it, browsers perhaps being an exception.



On Mon, 3 Jan 2011, Philip Jägenstedt wrote:

> > * The "bad cue" handling is stricter than it should be. After
> > collecting an id, the next line must be a timestamp line. Otherwise,
> > we skip everything until a blank line, so in the following the
> > parser would jump to "bad cue" on line "2" and skip the whole cue.
> >
> > 1
> > 2
> > 00:00:00.000 --> 00:00:01.000
> > Bla
> >
> > This doesn't match what most existing SRT parsers do, as they simply
> > look for timing lines and ignore everything else. If we really need
> > to collect the id instead of ignoring it like everyone else, this
> > should be more robust, so that a valid timing line always begins a
> > new cue. Personally, I'd prefer if it is simply ignored and that we
> > use some form of in-cue markup for styling hooks.
>
> The IDs are useful for referencing cues from script, so I haven't
> removed them. I've also left the parsing as is for when neither the
> first nor second line is a timing line, since that gives us a lot of
> headroom for future extensions (we can do anything so long as the
> second line doesn't start with a timestamp and "-->" and another
> timestamp).

In the case of feeding future extensions to current parsers, it's way
better fallback behavior to simply ignore the unrecognized second line
than to discard the entire cue. The current behavior seems unnecessarily
strict and makes the parser more complicated than it needs to be. My
preference is just ignore anything preceding the timing line, but even
if we must have IDs it can still be made simpler and more robust than
what is currently spec'ed.

If we just ignore content until we hit a line that happens to look like a
timing line, then we are much more constrained in what we can do in the
future. For example, we couldn't introduce a "comment block" syntax, since
any comment containing a timing line wouldn't be ignored. On the other
hand if we keep the syntax as it is now, we can introduce a comment block
just by having its first line include a "-->" but not have it match the
timestamp syntax, e.g. by having it be "--> COMMENT" or some such.

One of us must be confused, do you mean something like this?

1
--> COMMENT
00:00.000 --> 00:01.000
Cue text

Adding this syntax would break the *current* parser, as it would fail in step 39 (Collect WebVTT cue timings and settings) and then skip the rest of the cue. If we want any room for extensions along these lines, then multiple lines preceding the timing line must be handled gracefully.

Looking at the parser more closely, I don't really see how doing anything
more complex than skipping the block entirely would be simpler than what
we have now, anyway.

I suggest:

* Step 31: Try to "collect WebVTT cue timings and settings" instead of checking for the substring "-->". If it succeeds, jump to what is now step 40. If it fails, continue at what is now step 32. (This allows adding any syntax as long as it doesn't exactly match a timing line, including "--> COMMENT". As a bonus, one can fail faster when trying to parse an entire timing line rather than doing a substring search for "-->".)

* Step 32: Only set the id line if it's not already set. (Assuming we want the first line to be the id line in future extensions.)

 * Step 39: Jump to the new step 31.

In case not every detail is correct, the idea is to first try to match a timing line and to take the first line that is not a timing line (if any) as the id, leaving everything in between open for future syntax changes, even if they use "-->".

I think it's fairly important that we handle this. Double id lines is an easy mistake to make when copying things around. Silently dropping those cues would be worse than what many existing (line-based, id-ignoring) SRT parsers do.



On Sat, 22 Jan 2011, Philip Jägenstedt wrote:

I'm inclined to say that we should normalize all whitespace during
parsing and not have explicit line breaks at all. If people really want
two lines, they should use two cues. In practice, I don't know how well
that would fare, though. What other solutions are there?

I think we definitely need line breaks, e.g. for cases like:

  -- Do you want to go to the zoo?
  -- Yes!
  -- Then put your shoes on!

...which is quite common style in some locales.

Right, normalizing all whitespace would be overkill.

However, I agree that we should encourage people to let browsers wrap the
lines. Not sure how to encourage that more.

On Mon, 14 Feb 2011, Philip Jägenstedt wrote:
>
> [line wrapping]

There's still plenty of room for improvements in line wrapping, though.
It seems to me that the main reason that people line wrap captions
manually is to avoid getting two lines of very different length, as that
looks quite unbalanced. There's no way to make that happen with CSS, and
AFAIK it's not done by the WebVTT rendering spec either.

WebVTT just defers to CSS for this. I agree that it would be nice for CSS
to allow UAs to do more clever things here and (more importantly) for UAs
to actually do more clever things here.

To expand a bit more on the problem and suggested solution, consider the example cue "This sentence is spoken by a single speaker and is presented as a single cue."

If simple line-wrapping (how browsers currently render text) is used it might be:

"This sentence is spoken by a single speaker and is presented as a
single cue."

Subtitles tend to be line-wrapped to have more balanced line width, and at least I would certainly much prefer this line wrapping:

"This sentence is spoken by a single speaker
and is presented as a single cue."

Apart from being easier to read, this is also much more suitable for left/right-alignment in cases where that is used to associate the cue with a speaker on screen. With WebVTT, one would have to manually line-break the text to get this result. Apart from wasting the time of the captioner, it will also break if a slightly larger font is used -- you might get this rendering instead:

"This sentence is spoken by a single
speaker
and is presented as a single cue."

In other cases you might get 4 lines where 3 would have been enough. This is not a theoretical issue, I see it fairly with SRT subtitles rendered at another size than was tested with.

My suggested solution is to first layout the text using all of the available width. Then, decrease the width as much as possible without increasing the number of line breaks. The algorithm should also prefer to make the first line the longest, as this is IMO more aesthetically pleasing.

I would like to see this specified and would gladly implement it in Opera, but in which spec does it belong? It seems fairly subtitling-specific to me, so if it could be in the WebVTT rendering rules to begin with (as opposed to CSS with vendor prefixes) that would be at least short-term awesome. It's only if this is the default line-wrapping for <track>+WebVTT that people are going to discover this and stop manually line-breaking their captions.



On Tue, 18 Jan 2011, Robert O'Callahan wrote:

One solution that could work here is to honour dynamic changes to
'preload', so switching preload to 'none' would stop buffering. Then a
script could do that, for example, after the user has paused the video
for ten seconds. The script could also look at 'buffered' to make its
decision.

If browsers want to do that I'm quite happy to add something explicitly to
that effect to the spec. Right now the spec doesn't disallow it.

For now, Opera has made it impossible to change the internal preload state from a higher state to a lower state specifically to prevent this. If script authors could start and stop the buffering at will, it would certainly be abused to perform throttling using lots of small requests. If the buffering behavior of browsers is broken, I'd prefer to fix it (in spec or implementation) rather than to allow scripts to work around it.



On Wed, 19 Jan 2011, Philip Jägenstedt wrote:

The 3 preload states imply 3 simple buffering strategies:

none: don't touch the network at all
preload: buffer as little as possible while still reaching readyState
HAVE_METADATA
auto: buffer as fast and much as possible

"auto" isn't "as fast and much as possible", it's "as fast and much as
will make the user happy". In some configurations, it might be the same as
"none" (e.g. if the user is paying by the byte and hates video).

The way I see it, that's just a matter of a user preference to limit the internal preload state to "none" regardless of what the content attribute.

However, the state we're discussing is when the user has begun playing the
video. The spec doesn't talk about it, but I call it:

invoked: buffer as little as possible without readyState dropping below
HAVE_FUTURE_DATA (in other words: being able to play from currentTime to
duration at playbackRate without waiting for the network)

There's also a fifth state, let's call it "aggressive", where even while
playing the video the UA is trying to download the whole thing in case the
connection drops.

This is the same as "auto" for now, but sure, that could be improved.

If the available bandwidth exceeds the bandwidth of the resource, some
kind of throttling must eventually be used. There are mainly 2 options
for doing this:

1. Throttle at the TCP level by not reading data from the socket (not at all
to suspend, or at a controlled rate to buffer ahead)
2. Use HTTP byte ranges, making many smaller requests with any kind of
throttling at the TCP level

There's also option 3, to handle the fifth state above: don't throttle.


When HTTP byte ranges are used to achieve bandwidth management, it's
hard to talk about a single downloadBufferTarget that is the number of
seconds buffered ahead. Rather, there might be an upper and lower limit
within which the browser tries to stay, so that each request can be of a
reasonable size. Neither an author-provided minumum or maximum value can
be followed particularly closely, but could possibly be taken as a hint
of some sort.

Would it be a more useful hint than "preload"? I'm skeptical about adding
many hints with no requirements. If there's some specific further
information we can add, though, it might make sense to add more features
to "preload".

I don't think that now is a good time to add more features to preload, given that what we have isn't interoperably implemented yet.

The above buffering strategies are still not enough, because users seem
to expect that in a low-bandwidth situation, the video will keep
buffering until they can watch it through to the end. These seem to be
the options for solving the problem:

* Make sites that want this behavior set .preload='auto' in the 'paused'
event handler

* Add an option in the context menu to "Preload Video" or some such

* Cause an invoked (see dfn above) but paused video to behave like
preload=auto

* As above, but only when the available bandwidth is limited

I don't think any of these solutions are particularly good, so any input
on other options is very welcome!

If users expect something, it seems logical that it should just happen. I
don't have a problem with saying that it should depend on preload="",
though. If you like I can make the spec explicitly describe what the
preload="" hints mean while video is playing, too.

That would be a good start. In Opera, playing the video causes the internal preload state to go to "invoked".



On Thu, 20 Jan 2011, Philip Jägenstedt wrote:

There have been two non-trivial changes to the seeking algorithm in the
last year:

Discussed at http://lists.w3.org/Archives/Public/public-html/2010Feb/0003.html
lead to http://html5.org/r/4868

Discussed at http://lists.w3.org/Archives/Public/public-html/2010Jul/0217.html
lead to http://html5.org/r/5219

Yeah. In particular, sometimes there's no way for the UA to know
asynchronously if the seek can be done, which is why the attribute is set
after the method returns. It's not ideal, but the alternative is not
always implementable.


With that said, it seems like there's nothing that guarantees that the
asynchronous section doesn't start running while the script is still
running.

Yeah. It's not ideal, but I don't really see what we can do about it.

http://www.w3.org/Bugs/Public/show_bug.cgi?id=12267

By only updating the media state between tasks (or as tasks), the script that issued the seek could not see the state changed as a result of it.



On Fri, 4 Feb 2011, Matthew Gregan wrote:

For anyone following along, the behaviour has now been changed in the
Firefox 4 nightly builds.

On Mon, 24 Jan 2011, Robert O'Callahan wrote:

I agree. I think we should change behavior to match author expectations
and the other implementations, and let the spec change to match.

How do you handle the cases where it's not possible?


If all the browsers can do it, I'm all for going back to having
currentTime change synchronosuly.

Changing currentTime synchronously doesn't mean that seeking to that position will actually succeed, so I don't see why that would be a problem. currentTime would just be updated again once it's been clamped in the asynchronous section of the seek algorithm.




On Sat, 14 May 2011, Ojan Vafai wrote:

If someone proposed a workable solution, browser would likely implement
it. I can't think of a backwards-compatible solution to this, so I agree
that developers just need to learn the that this is a bad pattern. I
could imagine browsers logging a warning to the console in these cases,
but I worry that it would fire too much in today's web.

Indeed.


It's unfortunate that you need to use an inline event handler instead of
one registered via addEventListener to avoid the race condition.
Exposing something to the platform like jquery's live event handlers (
http://api.jquery.com/live/) could mitigate this problem in practice,
e.g. it would be just as easy or easier to register the event handler
before the element is created.

You can also work around it by setting src="" from script after you've
used addEventListener, or by checking the state manually after you've
added the handler and calling the handler if it is too late (though you
have to be aware of the situation where the event is actually already
scheduled and you added the listener between the time it was scheduled and
the time it fired, so your function really has to be idempotent).

A better fix would be http://www.w3.org/Bugs/Public/show_bug.cgi?id=12267 so there is no window where scripts see state X even though the related transition event has not fired yet.

--
Philip Jägenstedt
Core Developer
Opera Software

Reply via email to