Re: [whatwg] SRT research: timestamps

2011-10-05 Thread Silvia Pfeiffer
On Thu, Oct 6, 2011 at 10:51 AM, Ralph Giles  wrote:
> On 05/10/11 04:36 PM, Glenn Maynard wrote:
>
>> If the files don't work in VTT in any major implementation, then probably
>> not many.  It's the fault of overly-lenient parsers that these things happen
>> in the first place.
>
> A point Philip Jägenstedt has made is that it's sufficiently tedious to
> verify correct subtitle playback that authors are unlikely to do so with
> any vigilance. Therefore the better trade-off is to make the parser
> forgiving, rather than inflict the occasional missing cue on viewers.

That's a slippery slope to go down on. If they cannot see the
consequence, they assume it's legal. It's not like we are totally
screwing up the display - there's only one mis-authored cue missing.
If we accept one type of mis-authoring, where do you stop with
accepting weirdness? How can you make compatible implementations if
everyone decides for themselves what weirdness that is not in the spec
they accept?

I'd rather we have strict parsing and recover from brokenness. It's
the job of validators to identify broken cues. We should teach authors
to use validators before they decide that their files are ok.

As for some of the more dominant mis-authorings: we can accept them as
correct authoring, but then they have to be made part of the
specification and legalized.

Silvia.


Re: [whatwg] SRT research: timestamps

2011-10-05 Thread Glenn Maynard
On Wed, Oct 5, 2011 at 7:51 PM, Ralph Giles  wrote:

> A point Philip Jägenstedt has made is that it's sufficiently tedious to
> verify correct subtitle playback that authors are unlikely to do so with
> any vigilance. Therefore the better trade-off is to make the parser
> forgiving, rather than inflict the occasional missing cue on viewers.
>

How can you even time subtitles without ever looking at them?

Simon: Another useful statistic would be the number of files which 1:
*always* use periods in SRT timestamps (consistently wrong) compared to 2:
the number of files which mix periods and commas in timestamps (occasionally
wrong).  I'm guessing #1 is much more common.

-- 
Glenn Maynard


Re: [whatwg] HTMLLinkElement.disabled and HTMLLinkElement.sheet behavior

2011-10-05 Thread Boris Zbarsky

On 10/5/11 9:01 PM, Julien Chaffraix wrote:

Ah.  Do they set disabled and expect it to take effect whenever the sheet
actually appears?


Yes, we have seen some regressions because people were expecting exactly that.


So for what it's worth, Gecko implemented the current behavior of 
creating the stylesheet immediately as soon as we know the  is 
linking to a stylesheet in 
https://bugzilla.mozilla.org/show_bug.cgi?id=107567


One of the considerations there was in fact allowing pages to change 
disabled state without having to wait for the sheet to load.  That 
includes things like selection of alternate stylesheet sets working 
correctly even if not all the alternate sheets have finished loading and 
so forth...


-Boris


Re: [whatwg] HTMLLinkElement.disabled and HTMLLinkElement.sheet behavior

2011-10-05 Thread Julien Chaffraix
>> Thanks for the explanation. I took a black-box approach in testing - I
>> don't pretend to know how Firefox works - and from that perspective,
>> it looked like it was synchronous as the |sheet| was present and
>> properly populated in JS.
>
> Try setting an interval to poll right before the  is parsed.  That
> will black-box show that it's not synchronous.  ;)

I stand corrected. ;)

>> It is. However the specification states that |disabled| would be
>> ignored if there is no |sheet|. It looks like web-authors don't factor
>> this into their code.
>
> Ah.  Do they set disabled and expect it to take effect whenever the sheet
> actually appears?

Yes, we have seen some regressions because people were expecting exactly that.

Thanks,
Julien


Re: [whatwg] SRT research: timestamps

2011-10-05 Thread Ralph Giles
On 05/10/11 04:36 PM, Glenn Maynard wrote:

> If the files don't work in VTT in any major implementation, then probably
> not many.  It's the fault of overly-lenient parsers that these things happen
> in the first place.

A point Philip Jägenstedt has made is that it's sufficiently tedious to
verify correct subtitle playback that authors are unlikely to do so with
any vigilance. Therefore the better trade-off is to make the parser
forgiving, rather than inflict the occasional missing cue on viewers.

 -r




Re: [whatwg] SRT research: timestamps

2011-10-05 Thread Ralph Giles
On 05/10/11 10:22 AM, Simon Pieters wrote:

> I did some research on authoring errors in SRT timestamps to inform
> whether WebVTT parsing of timestamps should be changed.

This is completely awesome, thanks for doing it.

> hours too many '(^|\s|>)\d{3,}[:\.,]\d+[:\.,]\d+'
> 834

As Silvia mentioned, the WebVTT spec currently leaves the number of
digits in the hour field as implementation defined, so long as it's at
least two.

I asked previously[1] if we could agree on and specify a limit. Would
you mind checking what the histogram of digit numbers is in the hours
field? Especially if you can separate cases like

> 34500:24:01,000 --> 00:24:03,000

either because the index is missing, or because the the interval is
negative (for which the WebVTT spec would reject the entire cue).

Cheers,
 -r

[1]
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2011-September/033271.html


Re: [whatwg] SRT research: timestamps

2011-10-05 Thread David Singer

On Oct 5, 2011, at 16:36 , Glenn Maynard wrote:

> On Wed, Oct 5, 2011 at 7:17 PM, David Singer  wrote:
> which rather raises the question of how many people will write comma instead 
> of dot in VTT, given a european view or SRT habits.
> 
> If the files don't work in VTT in any major implementation, then probably not 
> many.  It's the fault of overly-lenient parsers that these things happen in 
> the first place.


I rather expect that there may be people tempted to write an implementation 
that will ingest SRT and VTT, and unify their parsing to cope with either. "Be 
strict with what you produce, and liberal with what you accept" is a maxim for 
at least some people, also.  And being strict with HTML (I seem to recall that 
one of the features of XHTML was that nothing was supposed to show when 
documents had errors) didn't get a lot of traction, either.

David Singer
Multimedia and Software Standards, Apple Inc.



Re: [whatwg] SRT research: timestamps

2011-10-05 Thread Glenn Maynard
On Wed, Oct 5, 2011 at 7:17 PM, David Singer  wrote:

> which rather raises the question of how many people will write comma
> instead of dot in VTT, given a european view or SRT habits.
>

If the files don't work in VTT in any major implementation, then probably
not many.  It's the fault of overly-lenient parsers that these things happen
in the first place.

-- 
Glenn Maynard


Re: [whatwg] SRT research: timestamps

2011-10-05 Thread David Singer

On Oct 5, 2011, at 14:07 , Silvia Pfeiffer wrote:

> On Thu, Oct 6, 2011 at 4:22 AM, Simon Pieters  wrote:
>> The most common error is to use a dot instead of a comma.
> 
> They're WebVTT files already. ;-)
> 

which rather raises the question of how many people will write comma instead of 
dot in VTT, given a european view or SRT habits.


David Singer
Multimedia and Software Standards, Apple Inc.



Re: [whatwg] SRT research: timestamps

2011-10-05 Thread Silvia Pfeiffer
On Thu, Oct 6, 2011 at 4:22 AM, Simon Pieters  wrote:
> I did some research on authoring errors in SRT timestamps to inform whether
> WebVTT parsing of timestamps should be changed.
>
> Our starting point was 70,000 files provided to Opera (for research
> purposes) by opensubtitles.org (thanks!) supposedly being SRT files. We are
> not allowed to share the files.
>
> Filtering out files that don't contain "-->" leaved 65,000 files.
>
> Grepping for lines that contain "-->" resulted in 52,000,000 lines (which
> should represent roughly the total number of cues). Of those, there were
> 31,900 lines that are invalid, i.e. don't match the python regexp
> '\s*\d\d:[0-5]\d:[0-5]\d\,\d\d\d\s*-->\s*\d\d:[0-5]\d:[0-5]\d\,\d\d\d($|\s)'.
>
> Those are categorized as follows. Note that a line can belong to several
> categories (except for "none of the above"):
>
>
> hours too few '(^|\s|>)\d[:\.,]\d+[:\.,]\d+'
> 57
> hours too many '(^|\s|>)\d{3,}[:\.,]\d+[:\.,]\d+'
> 834

IIUC this means there are more than 2 characters used for the hours. I
think that's a bug of your regex then. There was always going to be
more than 99 hours possible and WebVTT Timestamps are no different:
http://www.whatwg.org/specs/web-apps/current-work/webvtt.html#webvtt-timestamp
. It says "two or more characters...".


> minutes too few '(^|\s|>)\d+[:\.,]\d[:\.,]\d+'
> 16
> minutes too many '(^|\s|>)\d+[:\.,]\d{3,}[:\.,]\d+'
> 11
> seconds too few '(^|\s|>)\d+[:\.,]\d+[:\.,]\d([:.,-]|\s|$)'
> 889
> seconds too many '(^|\s|>)\d+[:\.,]\d+[:\.,]\d{3,}'
> 154
> decimals too few '(^|\s|>)\d+[:\.,]\d+[:\.,]\d+[:\.,]\d{1,2}(\s|$|-)'
> 2085
> decimals too many '(^|\s|>)\d+[:\.,]\d+[:\.,]\d+[:\.,]\d{4,}'
> 62
> decimals missing '(^|\s|>)\d+[:\.,]\d+[:\.,]\d+(\s|$|-)'
> 132
> minutes gt 59 '(^|\s|>)\d+[:\.,]0{0,}[6-9]\d+[:\.,]\d+'
> 6

That's small.

> seconds gt 59 '(^|\s|>)\d+[:\.,]\d+[:\.,]0{0,}[6-9]\d+'
> 184

That's fairly small, in particular considering that spaces in
timestamps or an elongated arrow create a lot more problems.

> leading garbage '^[^\s\d]+\d+[:\.,]\d+[:\.,]\d+'
> 599
> trailing garbage '-->\s*(\d+[:\.,]){2,3}\d+(\s+[^\s]|[^\s\d:\.,])'
> 532
> colon instead of comma '\d+[:\.,]\d+[:\.,]\d+[:\.,]\d+:\d+'
> 26
> dot instead of comma '\d+[:\.,]\d+[:\.,]\d+\.\d+'
> 25372
> comma instead of colon '\d+,\d+[:\.,]\d+'
> 82
> dot instead of colon '\d+\.\d+[:\.,]\d+'
> 41
> id before timestamp '^\s*\d+\s+\d+[:\.,]\d+'
> 115
> spaces in timestamp '(\d[\d\s]*[:\.,]\s*){2,3}\d[\d\s]*' and not
> '(\d+[:\.,]){2,3}\d+'
> 922
> too long arrow '\d\s*-{3,}>\s*\d'
> 326
> none of the above
> 969
>
>
> The most common error is to use a dot instead of a comma.

They're WebVTT files already. ;-)


> Some appear to be a different format, and some appear to be just garbage.
>
> Too few or too many hours might not technically be an error, however it
> appeared that some of too many hours were cases where the line between the
> id and the timestamp was missing (and no whitespace between), e.g.:
>
> 34500:24:01,000 --> 00:24:03,000
>
> The trailing garbage is mostly the line between the timestamp and the cue
> text being missing, e.g.:
>
> 00:00:01,000 --> 00:00:03,000Hello.

So we have a lot more errors coming from missing new lines than from
mis-authoring the hour, minute or seconds number? That's encouraging.
The only common number mistake seems to be to make the decimals
shorter than 3 numbers. Maybe we can resolve this by just having a
rule for what that should be interpreted as?

Cheers,
Silvia.


Re: [whatwg] (no subject)

2011-10-05 Thread Ralph Giles
On 05/10/11 11:37 AM, Ashley Sheridan wrote:

> I would assume the part that the Skype plugin is being used for, as the
> only other part of the chat that isn't HTML/Javascript code is the
> Jabber connectivity, which isn't strictly a plugin per-say, more an
> additional interface to the raw data that is enabled through server
> modules.

The Audio/Video chat part, which supports similar uses to the Skype
plugin, is part of the WebRTC effort. Jabber connectivity is something
you can currently do by tunnelling the stanzas (messages) over XHR or
WebSockets.

Hope that helps orient you,
 -r


Re: [whatwg] (no subject)

2011-10-05 Thread Ashley Sheridan
On Wed, 2011-10-05 at 17:59 +, Ian Hickson wrote:

> On Wed, 5 Oct 2011, Hamza dridi wrote:
> >
> > Hi , i have something in my mind and i thaught it would be better i tell 
> > you so excuse me if this is not the right place and excuse for my bad 
> > english so i've seen facebook using a plugin in order do chat , so my 
> > suggestion is what if Html5 would support such functionality , and we 
> > will no longer need a plugin for that , sorry again if it's the wrong 
> > place and tell me if this is a stupid idea .
> 
> Do you mean text chat (IM) or audio/video chat (video conferencing)?
> 


I would assume the part that the Skype plugin is being used for, as the
only other part of the chat that isn't HTML/Javascript code is the
Jabber connectivity, which isn't strictly a plugin per-say, more an
additional interface to the raw data that is enabled through server
modules.

-- 
Thanks,
Ash
http://www.ashleysheridan.co.uk




Re: [whatwg] (no subject)

2011-10-05 Thread Ian Hickson
On Wed, 5 Oct 2011, Hamza dridi wrote:
>
> Hi , i have something in my mind and i thaught it would be better i tell 
> you so excuse me if this is not the right place and excuse for my bad 
> english so i've seen facebook using a plugin in order do chat , so my 
> suggestion is what if Html5 would support such functionality , and we 
> will no longer need a plugin for that , sorry again if it's the wrong 
> place and tell me if this is a stupid idea .

Do you mean text chat (IM) or audio/video chat (video conferencing)?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


[whatwg] (no subject)

2011-10-05 Thread Hamza dridi
Hi , i have something in my mind and i thaught it would be better i tell you
so excuse me if this is not the right place and excuse for my bad english
so i've seen facebook using a plugin in order do chat , so my suggestion is
what if Html5 would support such functionality , and we will no longer need
a plugin for that , sorry again if it's the wrong place and tell me if this
is a stupid idea .


Re: [whatwg] [html5] r6630 - [giow] (0) Define navigating to video and audio resources Fixing http://www.w3.o [...]

2011-10-05 Thread Ian Hickson
On Wed, 5 Oct 2011, Simon Pieters wrote:
> 
> video and audio should have controls="" and autoplay=""

The spec allows browsers to do that (in fact it explicitly calls out 
autoplay=""), but do we really want to require one or the other? I can see 
arguments for having only one or the other or both.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


[whatwg] SRT research: timestamps

2011-10-05 Thread Simon Pieters
I did some research on authoring errors in SRT timestamps to inform  
whether WebVTT parsing of timestamps should be changed.


Our starting point was 70,000 files provided to Opera (for research  
purposes) by opensubtitles.org (thanks!) supposedly being SRT files. We  
are not allowed to share the files.


Filtering out files that don't contain "-->" leaved 65,000 files.

Grepping for lines that contain "-->" resulted in 52,000,000 lines (which  
should represent roughly the total number of cues). Of those, there were  
31,900 lines that are invalid, i.e. don't match the python regexp  
'\s*\d\d:[0-5]\d:[0-5]\d\,\d\d\d\s*-->\s*\d\d:[0-5]\d:[0-5]\d\,\d\d\d($|\s)'.


Those are categorized as follows. Note that a line can belong to several  
categories (except for "none of the above"):



hours too few '(^|\s|>)\d[:\.,]\d+[:\.,]\d+'
57
hours too many '(^|\s|>)\d{3,}[:\.,]\d+[:\.,]\d+'
834
minutes too few '(^|\s|>)\d+[:\.,]\d[:\.,]\d+'
16
minutes too many '(^|\s|>)\d+[:\.,]\d{3,}[:\.,]\d+'
11
seconds too few '(^|\s|>)\d+[:\.,]\d+[:\.,]\d([:.,-]|\s|$)'
889
seconds too many '(^|\s|>)\d+[:\.,]\d+[:\.,]\d{3,}'
154
decimals too few '(^|\s|>)\d+[:\.,]\d+[:\.,]\d+[:\.,]\d{1,2}(\s|$|-)'
2085
decimals too many '(^|\s|>)\d+[:\.,]\d+[:\.,]\d+[:\.,]\d{4,}'
62
decimals missing '(^|\s|>)\d+[:\.,]\d+[:\.,]\d+(\s|$|-)'
132
minutes gt 59 '(^|\s|>)\d+[:\.,]0{0,}[6-9]\d+[:\.,]\d+'
6
seconds gt 59 '(^|\s|>)\d+[:\.,]\d+[:\.,]0{0,}[6-9]\d+'
184
leading garbage '^[^\s\d]+\d+[:\.,]\d+[:\.,]\d+'
599
trailing garbage '-->\s*(\d+[:\.,]){2,3}\d+(\s+[^\s]|[^\s\d:\.,])'
532
colon instead of comma '\d+[:\.,]\d+[:\.,]\d+[:\.,]\d+:\d+'
26
dot instead of comma '\d+[:\.,]\d+[:\.,]\d+\.\d+'
25372
comma instead of colon '\d+,\d+[:\.,]\d+'
82
dot instead of colon '\d+\.\d+[:\.,]\d+'
41
id before timestamp '^\s*\d+\s+\d+[:\.,]\d+'
115
spaces in timestamp '(\d[\d\s]*[:\.,]\s*){2,3}\d[\d\s]*' and not  
'(\d+[:\.,]){2,3}\d+'

922
too long arrow '\d\s*-{3,}>\s*\d'
326
none of the above
969


The most common error is to use a dot instead of a comma.

Some appear to be a different format, and some appear to be just garbage.

Too few or too many hours might not technically be an error, however it  
appeared that some of too many hours were cases where the line between the  
id and the timestamp was missing (and no whitespace between), e.g.:


34500:24:01,000 --> 00:24:03,000

The trailing garbage is mostly the line between the timestamp and the cue  
text being missing, e.g.:


00:00:01,000 --> 00:00:03,000Hello.

--
Simon Pieters
Opera Software


Re: [whatwg] [html5] r6630 - [giow] (0) Define navigating to video and audio resources Fixing http://www.w3.o [...]

2011-10-05 Thread Simon Pieters

On Wed, 05 Oct 2011 02:02:52 +0200,  wrote:


Author: ianh
Date: 2011-10-04 17:02:51 -0700 (Tue, 04 Oct 2011)
New Revision: 6630

Modified:
   complete.html
   index
   source
Log:
[giow] (0) Define navigating to video and audio resources
Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=13759




+  The element host element to create for the
+  media is the element given in the table below in the second cell of
+  the row whose first cell describes the media. The appropriate
+  attribute to set is the one given by the third cell in that same
+  row.
+
+  
+   
+  Type of media
+  Element for the media
+  Appropriate attribute
+  Image
+  img
+  src
+  Video
+  video
+  src
+  Audio
+  audio
+  src
+  


video and audio should have controls="" and autoplay=""

--
Simon Pieters
Opera Software


Re: [whatwg] HTMLLinkElement.disabled and HTMLLinkElement.sheet behavior

2011-10-05 Thread Henri Sivonen
On Tue, Oct 4, 2011 at 9:54 PM, Boris Zbarsky  wrote:
> What Firefox does do is block execution of