Re: [whatwg] SRT research: timestamps
On Thu, Oct 6, 2011 at 10:51 AM, Ralph Giles wrote: > On 05/10/11 04:36 PM, Glenn Maynard wrote: > >> If the files don't work in VTT in any major implementation, then probably >> not many. It's the fault of overly-lenient parsers that these things happen >> in the first place. > > A point Philip Jägenstedt has made is that it's sufficiently tedious to > verify correct subtitle playback that authors are unlikely to do so with > any vigilance. Therefore the better trade-off is to make the parser > forgiving, rather than inflict the occasional missing cue on viewers. That's a slippery slope to go down on. If they cannot see the consequence, they assume it's legal. It's not like we are totally screwing up the display - there's only one mis-authored cue missing. If we accept one type of mis-authoring, where do you stop with accepting weirdness? How can you make compatible implementations if everyone decides for themselves what weirdness that is not in the spec they accept? I'd rather we have strict parsing and recover from brokenness. It's the job of validators to identify broken cues. We should teach authors to use validators before they decide that their files are ok. As for some of the more dominant mis-authorings: we can accept them as correct authoring, but then they have to be made part of the specification and legalized. Silvia.
Re: [whatwg] SRT research: timestamps
On Wed, Oct 5, 2011 at 7:51 PM, Ralph Giles wrote: > A point Philip Jägenstedt has made is that it's sufficiently tedious to > verify correct subtitle playback that authors are unlikely to do so with > any vigilance. Therefore the better trade-off is to make the parser > forgiving, rather than inflict the occasional missing cue on viewers. > How can you even time subtitles without ever looking at them? Simon: Another useful statistic would be the number of files which 1: *always* use periods in SRT timestamps (consistently wrong) compared to 2: the number of files which mix periods and commas in timestamps (occasionally wrong). I'm guessing #1 is much more common. -- Glenn Maynard
Re: [whatwg] HTMLLinkElement.disabled and HTMLLinkElement.sheet behavior
On 10/5/11 9:01 PM, Julien Chaffraix wrote: Ah. Do they set disabled and expect it to take effect whenever the sheet actually appears? Yes, we have seen some regressions because people were expecting exactly that. So for what it's worth, Gecko implemented the current behavior of creating the stylesheet immediately as soon as we know the is linking to a stylesheet in https://bugzilla.mozilla.org/show_bug.cgi?id=107567 One of the considerations there was in fact allowing pages to change disabled state without having to wait for the sheet to load. That includes things like selection of alternate stylesheet sets working correctly even if not all the alternate sheets have finished loading and so forth... -Boris
Re: [whatwg] HTMLLinkElement.disabled and HTMLLinkElement.sheet behavior
>> Thanks for the explanation. I took a black-box approach in testing - I >> don't pretend to know how Firefox works - and from that perspective, >> it looked like it was synchronous as the |sheet| was present and >> properly populated in JS. > > Try setting an interval to poll right before the is parsed. That > will black-box show that it's not synchronous. ;) I stand corrected. ;) >> It is. However the specification states that |disabled| would be >> ignored if there is no |sheet|. It looks like web-authors don't factor >> this into their code. > > Ah. Do they set disabled and expect it to take effect whenever the sheet > actually appears? Yes, we have seen some regressions because people were expecting exactly that. Thanks, Julien
Re: [whatwg] SRT research: timestamps
On 05/10/11 04:36 PM, Glenn Maynard wrote: > If the files don't work in VTT in any major implementation, then probably > not many. It's the fault of overly-lenient parsers that these things happen > in the first place. A point Philip Jägenstedt has made is that it's sufficiently tedious to verify correct subtitle playback that authors are unlikely to do so with any vigilance. Therefore the better trade-off is to make the parser forgiving, rather than inflict the occasional missing cue on viewers. -r
Re: [whatwg] SRT research: timestamps
On 05/10/11 10:22 AM, Simon Pieters wrote: > I did some research on authoring errors in SRT timestamps to inform > whether WebVTT parsing of timestamps should be changed. This is completely awesome, thanks for doing it. > hours too many '(^|\s|>)\d{3,}[:\.,]\d+[:\.,]\d+' > 834 As Silvia mentioned, the WebVTT spec currently leaves the number of digits in the hour field as implementation defined, so long as it's at least two. I asked previously[1] if we could agree on and specify a limit. Would you mind checking what the histogram of digit numbers is in the hours field? Especially if you can separate cases like > 34500:24:01,000 --> 00:24:03,000 either because the index is missing, or because the the interval is negative (for which the WebVTT spec would reject the entire cue). Cheers, -r [1] http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2011-September/033271.html
Re: [whatwg] SRT research: timestamps
On Oct 5, 2011, at 16:36 , Glenn Maynard wrote: > On Wed, Oct 5, 2011 at 7:17 PM, David Singer wrote: > which rather raises the question of how many people will write comma instead > of dot in VTT, given a european view or SRT habits. > > If the files don't work in VTT in any major implementation, then probably not > many. It's the fault of overly-lenient parsers that these things happen in > the first place. I rather expect that there may be people tempted to write an implementation that will ingest SRT and VTT, and unify their parsing to cope with either. "Be strict with what you produce, and liberal with what you accept" is a maxim for at least some people, also. And being strict with HTML (I seem to recall that one of the features of XHTML was that nothing was supposed to show when documents had errors) didn't get a lot of traction, either. David Singer Multimedia and Software Standards, Apple Inc.
Re: [whatwg] SRT research: timestamps
On Wed, Oct 5, 2011 at 7:17 PM, David Singer wrote: > which rather raises the question of how many people will write comma > instead of dot in VTT, given a european view or SRT habits. > If the files don't work in VTT in any major implementation, then probably not many. It's the fault of overly-lenient parsers that these things happen in the first place. -- Glenn Maynard
Re: [whatwg] SRT research: timestamps
On Oct 5, 2011, at 14:07 , Silvia Pfeiffer wrote: > On Thu, Oct 6, 2011 at 4:22 AM, Simon Pieters wrote: >> The most common error is to use a dot instead of a comma. > > They're WebVTT files already. ;-) > which rather raises the question of how many people will write comma instead of dot in VTT, given a european view or SRT habits. David Singer Multimedia and Software Standards, Apple Inc.
Re: [whatwg] SRT research: timestamps
On Thu, Oct 6, 2011 at 4:22 AM, Simon Pieters wrote: > I did some research on authoring errors in SRT timestamps to inform whether > WebVTT parsing of timestamps should be changed. > > Our starting point was 70,000 files provided to Opera (for research > purposes) by opensubtitles.org (thanks!) supposedly being SRT files. We are > not allowed to share the files. > > Filtering out files that don't contain "-->" leaved 65,000 files. > > Grepping for lines that contain "-->" resulted in 52,000,000 lines (which > should represent roughly the total number of cues). Of those, there were > 31,900 lines that are invalid, i.e. don't match the python regexp > '\s*\d\d:[0-5]\d:[0-5]\d\,\d\d\d\s*-->\s*\d\d:[0-5]\d:[0-5]\d\,\d\d\d($|\s)'. > > Those are categorized as follows. Note that a line can belong to several > categories (except for "none of the above"): > > > hours too few '(^|\s|>)\d[:\.,]\d+[:\.,]\d+' > 57 > hours too many '(^|\s|>)\d{3,}[:\.,]\d+[:\.,]\d+' > 834 IIUC this means there are more than 2 characters used for the hours. I think that's a bug of your regex then. There was always going to be more than 99 hours possible and WebVTT Timestamps are no different: http://www.whatwg.org/specs/web-apps/current-work/webvtt.html#webvtt-timestamp . It says "two or more characters...". > minutes too few '(^|\s|>)\d+[:\.,]\d[:\.,]\d+' > 16 > minutes too many '(^|\s|>)\d+[:\.,]\d{3,}[:\.,]\d+' > 11 > seconds too few '(^|\s|>)\d+[:\.,]\d+[:\.,]\d([:.,-]|\s|$)' > 889 > seconds too many '(^|\s|>)\d+[:\.,]\d+[:\.,]\d{3,}' > 154 > decimals too few '(^|\s|>)\d+[:\.,]\d+[:\.,]\d+[:\.,]\d{1,2}(\s|$|-)' > 2085 > decimals too many '(^|\s|>)\d+[:\.,]\d+[:\.,]\d+[:\.,]\d{4,}' > 62 > decimals missing '(^|\s|>)\d+[:\.,]\d+[:\.,]\d+(\s|$|-)' > 132 > minutes gt 59 '(^|\s|>)\d+[:\.,]0{0,}[6-9]\d+[:\.,]\d+' > 6 That's small. > seconds gt 59 '(^|\s|>)\d+[:\.,]\d+[:\.,]0{0,}[6-9]\d+' > 184 That's fairly small, in particular considering that spaces in timestamps or an elongated arrow create a lot more problems. > leading garbage '^[^\s\d]+\d+[:\.,]\d+[:\.,]\d+' > 599 > trailing garbage '-->\s*(\d+[:\.,]){2,3}\d+(\s+[^\s]|[^\s\d:\.,])' > 532 > colon instead of comma '\d+[:\.,]\d+[:\.,]\d+[:\.,]\d+:\d+' > 26 > dot instead of comma '\d+[:\.,]\d+[:\.,]\d+\.\d+' > 25372 > comma instead of colon '\d+,\d+[:\.,]\d+' > 82 > dot instead of colon '\d+\.\d+[:\.,]\d+' > 41 > id before timestamp '^\s*\d+\s+\d+[:\.,]\d+' > 115 > spaces in timestamp '(\d[\d\s]*[:\.,]\s*){2,3}\d[\d\s]*' and not > '(\d+[:\.,]){2,3}\d+' > 922 > too long arrow '\d\s*-{3,}>\s*\d' > 326 > none of the above > 969 > > > The most common error is to use a dot instead of a comma. They're WebVTT files already. ;-) > Some appear to be a different format, and some appear to be just garbage. > > Too few or too many hours might not technically be an error, however it > appeared that some of too many hours were cases where the line between the > id and the timestamp was missing (and no whitespace between), e.g.: > > 34500:24:01,000 --> 00:24:03,000 > > The trailing garbage is mostly the line between the timestamp and the cue > text being missing, e.g.: > > 00:00:01,000 --> 00:00:03,000Hello. So we have a lot more errors coming from missing new lines than from mis-authoring the hour, minute or seconds number? That's encouraging. The only common number mistake seems to be to make the decimals shorter than 3 numbers. Maybe we can resolve this by just having a rule for what that should be interpreted as? Cheers, Silvia.
Re: [whatwg] (no subject)
On 05/10/11 11:37 AM, Ashley Sheridan wrote: > I would assume the part that the Skype plugin is being used for, as the > only other part of the chat that isn't HTML/Javascript code is the > Jabber connectivity, which isn't strictly a plugin per-say, more an > additional interface to the raw data that is enabled through server > modules. The Audio/Video chat part, which supports similar uses to the Skype plugin, is part of the WebRTC effort. Jabber connectivity is something you can currently do by tunnelling the stanzas (messages) over XHR or WebSockets. Hope that helps orient you, -r
Re: [whatwg] (no subject)
On Wed, 2011-10-05 at 17:59 +, Ian Hickson wrote: > On Wed, 5 Oct 2011, Hamza dridi wrote: > > > > Hi , i have something in my mind and i thaught it would be better i tell > > you so excuse me if this is not the right place and excuse for my bad > > english so i've seen facebook using a plugin in order do chat , so my > > suggestion is what if Html5 would support such functionality , and we > > will no longer need a plugin for that , sorry again if it's the wrong > > place and tell me if this is a stupid idea . > > Do you mean text chat (IM) or audio/video chat (video conferencing)? > I would assume the part that the Skype plugin is being used for, as the only other part of the chat that isn't HTML/Javascript code is the Jabber connectivity, which isn't strictly a plugin per-say, more an additional interface to the raw data that is enabled through server modules. -- Thanks, Ash http://www.ashleysheridan.co.uk
Re: [whatwg] (no subject)
On Wed, 5 Oct 2011, Hamza dridi wrote: > > Hi , i have something in my mind and i thaught it would be better i tell > you so excuse me if this is not the right place and excuse for my bad > english so i've seen facebook using a plugin in order do chat , so my > suggestion is what if Html5 would support such functionality , and we > will no longer need a plugin for that , sorry again if it's the wrong > place and tell me if this is a stupid idea . Do you mean text chat (IM) or audio/video chat (video conferencing)? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
[whatwg] (no subject)
Hi , i have something in my mind and i thaught it would be better i tell you so excuse me if this is not the right place and excuse for my bad english so i've seen facebook using a plugin in order do chat , so my suggestion is what if Html5 would support such functionality , and we will no longer need a plugin for that , sorry again if it's the wrong place and tell me if this is a stupid idea .
Re: [whatwg] [html5] r6630 - [giow] (0) Define navigating to video and audio resources Fixing http://www.w3.o [...]
On Wed, 5 Oct 2011, Simon Pieters wrote: > > video and audio should have controls="" and autoplay="" The spec allows browsers to do that (in fact it explicitly calls out autoplay=""), but do we really want to require one or the other? I can see arguments for having only one or the other or both. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
[whatwg] SRT research: timestamps
I did some research on authoring errors in SRT timestamps to inform whether WebVTT parsing of timestamps should be changed. Our starting point was 70,000 files provided to Opera (for research purposes) by opensubtitles.org (thanks!) supposedly being SRT files. We are not allowed to share the files. Filtering out files that don't contain "-->" leaved 65,000 files. Grepping for lines that contain "-->" resulted in 52,000,000 lines (which should represent roughly the total number of cues). Of those, there were 31,900 lines that are invalid, i.e. don't match the python regexp '\s*\d\d:[0-5]\d:[0-5]\d\,\d\d\d\s*-->\s*\d\d:[0-5]\d:[0-5]\d\,\d\d\d($|\s)'. Those are categorized as follows. Note that a line can belong to several categories (except for "none of the above"): hours too few '(^|\s|>)\d[:\.,]\d+[:\.,]\d+' 57 hours too many '(^|\s|>)\d{3,}[:\.,]\d+[:\.,]\d+' 834 minutes too few '(^|\s|>)\d+[:\.,]\d[:\.,]\d+' 16 minutes too many '(^|\s|>)\d+[:\.,]\d{3,}[:\.,]\d+' 11 seconds too few '(^|\s|>)\d+[:\.,]\d+[:\.,]\d([:.,-]|\s|$)' 889 seconds too many '(^|\s|>)\d+[:\.,]\d+[:\.,]\d{3,}' 154 decimals too few '(^|\s|>)\d+[:\.,]\d+[:\.,]\d+[:\.,]\d{1,2}(\s|$|-)' 2085 decimals too many '(^|\s|>)\d+[:\.,]\d+[:\.,]\d+[:\.,]\d{4,}' 62 decimals missing '(^|\s|>)\d+[:\.,]\d+[:\.,]\d+(\s|$|-)' 132 minutes gt 59 '(^|\s|>)\d+[:\.,]0{0,}[6-9]\d+[:\.,]\d+' 6 seconds gt 59 '(^|\s|>)\d+[:\.,]\d+[:\.,]0{0,}[6-9]\d+' 184 leading garbage '^[^\s\d]+\d+[:\.,]\d+[:\.,]\d+' 599 trailing garbage '-->\s*(\d+[:\.,]){2,3}\d+(\s+[^\s]|[^\s\d:\.,])' 532 colon instead of comma '\d+[:\.,]\d+[:\.,]\d+[:\.,]\d+:\d+' 26 dot instead of comma '\d+[:\.,]\d+[:\.,]\d+\.\d+' 25372 comma instead of colon '\d+,\d+[:\.,]\d+' 82 dot instead of colon '\d+\.\d+[:\.,]\d+' 41 id before timestamp '^\s*\d+\s+\d+[:\.,]\d+' 115 spaces in timestamp '(\d[\d\s]*[:\.,]\s*){2,3}\d[\d\s]*' and not '(\d+[:\.,]){2,3}\d+' 922 too long arrow '\d\s*-{3,}>\s*\d' 326 none of the above 969 The most common error is to use a dot instead of a comma. Some appear to be a different format, and some appear to be just garbage. Too few or too many hours might not technically be an error, however it appeared that some of too many hours were cases where the line between the id and the timestamp was missing (and no whitespace between), e.g.: 34500:24:01,000 --> 00:24:03,000 The trailing garbage is mostly the line between the timestamp and the cue text being missing, e.g.: 00:00:01,000 --> 00:00:03,000Hello. -- Simon Pieters Opera Software
Re: [whatwg] [html5] r6630 - [giow] (0) Define navigating to video and audio resources Fixing http://www.w3.o [...]
On Wed, 05 Oct 2011 02:02:52 +0200, wrote: Author: ianh Date: 2011-10-04 17:02:51 -0700 (Tue, 04 Oct 2011) New Revision: 6630 Modified: complete.html index source Log: [giow] (0) Define navigating to video and audio resources Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=13759 + The element host element to create for the + media is the element given in the table below in the second cell of + the row whose first cell describes the media. The appropriate + attribute to set is the one given by the third cell in that same + row. + + + + Type of media + Element for the media + Appropriate attribute + Image + img + src + Video + video + src + Audio + audio + src + video and audio should have controls="" and autoplay="" -- Simon Pieters Opera Software
Re: [whatwg] HTMLLinkElement.disabled and HTMLLinkElement.sheet behavior
On Tue, Oct 4, 2011 at 9:54 PM, Boris Zbarsky wrote: > What Firefox does do is block execution of