Re: [whatwg] Video feedback

Philip Jägenstedt Fri, 03 Jun 2011 05:20:44 -0700

On Fri, 03 Jun 2011 01:28:45 +0200, Ian Hickson <i...@hixie.ch> wrote:

> On Fri, 22 Oct 2010, Simon Pieters wrote:


Actually it was me, but that's OK :)

> > There was also some discussion about metadata. Language is sometimes
> > necessary for the font engine to pick the right glyph.
>
> Could you elaborate on this? My assumption was that we'd just use CSS,
> which doesn't rely on language for this.

It's not in any spec that I'm aware of, but some browsers (including
Opera) pick different glyphs depending on the language of the text,
which really helps when rendering CJK when you have several CJK fonts on
the system. Browsers will already know the language from <track
srclang>, so this would be for external players.


How is this problem solved in SRT players today?

Not at all, it seems. Both VLC and Totem allow setting the characterencoding and font used for subtitles in the (global) preferences menu, sopresumably you would change that if the default doesn't work. Fontswitching seems to mainly be an issue when your system has other defaultfonts than the text you're reading, and it appears that is rare enoughthat very little software does anything about it, browsers perhaps beingan exception.

On Mon, 3 Jan 2011, Philip Jägenstedt wrote:


> > * The "bad cue" handling is stricter than it should be. After
> > collecting an id, the next line must be a timestamp line. Otherwise,
> > we skip everything until a blank line, so in the following the
> > parser would jump to "bad cue" on line "2" and skip the whole cue.
> >
> > 1
> > 2
> > 00:00:00.000 --> 00:00:01.000
> > Bla
> >
> > This doesn't match what most existing SRT parsers do, as they simply
> > look for timing lines and ignore everything else. If we really need
> > to collect the id instead of ignoring it like everyone else, this
> > should be more robust, so that a valid timing line always begins a
> > new cue. Personally, I'd prefer if it is simply ignored and that we
> > use some form of in-cue markup for styling hooks.
>
> The IDs are useful for referencing cues from script, so I haven't
> removed them. I've also left the parsing as is for when neither the
> first nor second line is a timing line, since that gives us a lot of
> headroom for future extensions (we can do anything so long as the
> second line doesn't start with a timestamp and "-->" and another
> timestamp).

In the case of feeding future extensions to current parsers, it's way
better fallback behavior to simply ignore the unrecognized second line
than to discard the entire cue. The current behavior seems unnecessarily
strict and makes the parser more complicated than it needs to be. My
preference is just ignore anything preceding the timing line, but even
if we must have IDs it can still be made simpler and more robust than
what is currently spec'ed.


If we just ignore content until we hit a line that happens to look like a
timing line, then we are much more constrained in what we can do in the

future. For example, we couldn't introduce a "comment block" syntax,since

any comment containing a timing line wouldn't be ignored. On the other
hand if we keep the syntax as it is now, we can introduce a comment block
just by having its first line include a "-->" but not have it match the
timestamp syntax, e.g. by having it be "--> COMMENT" or some such.


One of us must be confused, do you mean something like this?

1
--> COMMENT
00:00.000 --> 00:01.000
Cue text

Adding this syntax would break the *current* parser, as it would fail instep 39 (Collect WebVTT cue timings and settings) and then skip the restof the cue. If we want any room for extensions along these lines, thenmultiple lines preceding the timing line must be handled gracefully.

Looking at the parser more closely, I don't really see how doing anything
more complex than skipping the block entirely would be simpler than what
we have now, anyway.


I suggest:

* Step 31: Try to "collect WebVTT cue timings and settings" instead ofchecking for the substring "-->". If it succeeds, jump to what is now step40. If it fails, continue at what is now step 32. (This allows adding anysyntax as long as it doesn't exactly match a timing line, including "-->COMMENT". As a bonus, one can fail faster when trying to parse an entiretiming line rather than doing a substring search for "-->".)

* Step 32: Only set the id line if it's not already set. (Assuming wewant the first line to be the id line in future extensions.)


 * Step 39: Jump to the new step 31.

In case not every detail is correct, the idea is to first try to match atiming line and to take the first line that is not a timing line (if any)as the id, leaving everything in between open for future syntax changes,even if they use "-->".

I think it's fairly important that we handle this. Double id lines is aneasy mistake to make when copying things around. Silently dropping thosecues would be worse than what many existing (line-based, id-ignoring) SRTparsers do.

On Sat, 22 Jan 2011, Philip Jägenstedt wrote:

I'm inclined to say that we should normalize all whitespace during
parsing and not have explicit line breaks at all. If people really want
two lines, they should use two cues. In practice, I don't know how well
that would fare, though. What other solutions are there?


I think we definitely need line breaks, e.g. for cases like:

  -- Do you want to go to the zoo?
  -- Yes!
  -- Then put your shoes on!

...which is quite common style in some locales.


Right, normalizing all whitespace would be overkill.

However, I agree that we should encourage people to let browsers wrap the
lines. Not sure how to encourage that more.

On Mon, 14 Feb 2011, Philip Jägenstedt wrote:

>
> [line wrapping]

There's still plenty of room for improvements in line wrapping, though.
It seems to me that the main reason that people line wrap captions
manually is to avoid getting two lines of very different length, as that
looks quite unbalanced. There's no way to make that happen with CSS, and
AFAIK it's not done by the WebVTT rendering spec either.


WebVTT just defers to CSS for this. I agree that it would be nice for CSS
to allow UAs to do more clever things here and (more importantly) for UAs
to actually do more clever things here.

To expand a bit more on the problem and suggested solution, consider theexample cue "This sentence is spoken by a single speaker and is presentedas a single cue."

If simple line-wrapping (how browsers currently render text) is used itmight be:


"This sentence is spoken by a single speaker and is presented as a
single cue."

Subtitles tend to be line-wrapped to have more balanced line width, and atleast I would certainly much prefer this line wrapping:


"This sentence is spoken by a single speaker
and is presented as a single cue."

Apart from being easier to read, this is also much more suitable forleft/right-alignment in cases where that is used to associate the cue witha speaker on screen. With WebVTT, one would have to manually line-breakthe text to get this result. Apart from wasting the time of the captioner,it will also break if a slightly larger font is used -- you might get thisrendering instead:


"This sentence is spoken by a single
speaker
and is presented as a single cue."

In other cases you might get 4 lines where 3 would have been enough. Thisis not a theoretical issue, I see it fairly with SRT subtitles rendered atanother size than was tested with.

My suggested solution is to first layout the text using all of theavailable width. Then, decrease the width as much as possible withoutincreasing the number of line breaks. The algorithm should also prefer tomake the first line the longest, as this is IMO more aestheticallypleasing.

I would like to see this specified and would gladly implement it in Opera,but in which spec does it belong? It seems fairly subtitling-specific tome, so if it could be in the WebVTT rendering rules to begin with (asopposed to CSS with vendor prefixes) that would be at least short-termawesome. It's only if this is the default line-wrapping for <track>+WebVTTthat people are going to discover this and stop manually line-breakingtheir captions.

On Tue, 18 Jan 2011, Robert O'Callahan wrote:


One solution that could work here is to honour dynamic changes to
'preload', so switching preload to 'none' would stop buffering. Then a
script could do that, for example, after the user has paused the video
for ten seconds. The script could also look at 'buffered' to make its
decision.

If browsers want to do that I'm quite happy to add something explicitlyto

that effect to the spec. Right now the spec doesn't disallow it.

For now, Opera has made it impossible to change the internal preload statefrom a higher state to a lower state specifically to prevent this. Ifscript authors could start and stop the buffering at will, it wouldcertainly be abused to perform throttling using lots of small requests. Ifthe buffering behavior of browsers is broken, I'd prefer to fix it (inspec or implementation) rather than to allow scripts to work around it.

On Wed, 19 Jan 2011, Philip Jägenstedt wrote:


The 3 preload states imply 3 simple buffering strategies:

none: don't touch the network at all
preload: buffer as little as possible while still reaching readyState
HAVE_METADATA
auto: buffer as fast and much as possible


"auto" isn't "as fast and much as possible", it's "as fast and much as

will make the user happy". In some configurations, it might be the sameas

"none" (e.g. if the user is paying by the byte and hates video).

The way I see it, that's just a matter of a user preference to limit theinternal preload state to "none" regardless of what the content attribute.

However, the state we're discussing is when the user has begun playingthe

video. The spec doesn't talk about it, but I call it:

invoked: buffer as little as possible without readyState dropping below
HAVE_FUTURE_DATA (in other words: being able to play from currentTime to
duration at playbackRate without waiting for the network)


There's also a fifth state, let's call it "aggressive", where even while

playing the video the UA is trying to download the whole thing in casethe

connection drops.


This is the same as "auto" for now, but sure, that could be improved.

If the available bandwidth exceeds the bandwidth of the resource, some
kind of throttling must eventually be used. There are mainly 2 options
for doing this:

1. Throttle at the TCP level by not reading data from the socket (notat all

to suspend, or at a controlled rate to buffer ahead)
2. Use HTTP byte ranges, making many smaller requests with any kind of
throttling at the TCP level


There's also option 3, to handle the fifth state above: don't throttle.

When HTTP byte ranges are used to achieve bandwidth management, it's
hard to talk about a single downloadBufferTarget that is the number of
seconds buffered ahead. Rather, there might be an upper and lower limit
within which the browser tries to stay, so that each request can be of a
reasonable size. Neither an author-provided minumum or maximum value can
be followed particularly closely, but could possibly be taken as a hint
of some sort.


Would it be a more useful hint than "preload"? I'm skeptical about adding
many hints with no requirements. If there's some specific further
information we can add, though, it might make sense to add more features
to "preload".

I don't think that now is a good time to add more features to preload,given that what we have isn't interoperably implemented yet.

The above buffering strategies are still not enough, because users seem
to expect that in a low-bandwidth situation, the video will keep
buffering until they can watch it through to the end. These seem to be
the options for solving the problem:

* Make sites that want this behavior set .preload='auto' in the 'paused'
event handler

* Add an option in the context menu to "Preload Video" or some such

* Cause an invoked (see dfn above) but paused video to behave like
preload=auto

* As above, but only when the available bandwidth is limited

I don't think any of these solutions are particularly good, so any input
on other options is very welcome!


If users expect something, it seems logical that it should just happen. I
don't have a problem with saying that it should depend on preload="",
though. If you like I can make the spec explicitly describe what the
preload="" hints mean while video is playing, too.

That would be a good start. In Opera, playing the video causes theinternal preload state to go to "invoked".

On Thu, 20 Jan 2011, Philip Jägenstedt wrote:

There have been two non-trivial changes to the seeking algorithm in the
last year:
Discussed athttp://lists.w3.org/Archives/Public/public-html/2010Feb/0003.html
lead to http://html5.org/r/4868
Discussed athttp://lists.w3.org/Archives/Public/public-html/2010Jul/0217.html
lead to http://html5.org/r/5219


Yeah. In particular, sometimes there's no way for the UA to know
asynchronously if the seek can be done, which is why the attribute is set
after the method returns. It's not ideal, but the alternative is not
always implementable.

With that said, it seems like there's nothing that guarantees that the
asynchronous section doesn't start running while the script is still
running.


Yeah. It's not ideal, but I don't really see what we can do about it.


http://www.w3.org/Bugs/Public/show_bug.cgi?id=12267

By only updating the media state between tasks (or as tasks), the scriptthat issued the seek could not see the state changed as a result of it.

On Fri, 4 Feb 2011, Matthew Gregan wrote:


For anyone following along, the behaviour has now been changed in the
Firefox 4 nightly builds.


On Mon, 24 Jan 2011, Robert O'Callahan wrote:


I agree. I think we should change behavior to match author expectations
and the other implementations, and let the spec change to match.


How do you handle the cases where it's not possible?


If all the browsers can do it, I'm all for going back to having
currentTime change synchronosuly.

Changing currentTime synchronously doesn't mean that seeking to thatposition will actually succeed, so I don't see why that would be aproblem. currentTime would just be updated again once it's been clamped inthe asynchronous section of the seek algorithm.

On Sat, 14 May 2011, Ojan Vafai wrote:


If someone proposed a workable solution, browser would likely implement
it. I can't think of a backwards-compatible solution to this, so I agree
that developers just need to learn the that this is a bad pattern. I
could imagine browsers logging a warning to the console in these cases,
but I worry that it would fire too much in today's web.


Indeed.

It's unfortunate that you need to use an inline event handler instead of
one registered via addEventListener to avoid the race condition.
Exposing something to the platform like jquery's live event handlers (
http://api.jquery.com/live/) could mitigate this problem in practice,
e.g. it would be just as easy or easier to register the event handler
before the element is created.


You can also work around it by setting src="" from script after you've
used addEventListener, or by checking the state manually after you've
added the handler and calling the handler if it is too late (though you
have to be aware of the situation where the event is actually already

scheduled and you added the listener between the time it was scheduledand

the time it fired, so your function really has to be idempotent).

A better fix would be http://www.w3.org/Bugs/Public/show_bug.cgi?id=12267so there is no window where scripts see state X even though the relatedtransition event has not fired yet.


--
Philip Jägenstedt
Core Developer
Opera Software

Re: [whatwg] Video feedback

Reply via email to