Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Thu, 26 Aug 2010 02:28:49 +0200, Chris Double chris.dou...@double.co.nz wrote: On Thu, Aug 26, 2010 at 5:25 AM, Eric Carlson eric.carl...@apple.com wrote: FWIW, I agree with Silvia that a new file extension and MIME type make sense. I also think that a new file extension and MIME type is the way to go. Would Firefox / Safari support text/srt files in some undocumented fashion then or just simply not support those? The former would not really be an acceptable solution to me. -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
Silvia Pfeiffer wrote: You misunderstand my intent. I am by no means suggesting that no WebSRT content is treated as SRT by any application. All I am asking for is a different file extension and a different mime type and possibly a magic identifier such that *authoring* applications (and authors) can clearly designate this to be a different format, in particular if they include new features. Then a *playback application* has the chance to identify them as a different format and provide a specific parser for it, instead of failing like Totem. They can also decide to extend their existing SRT parser to support both WebSRT and SRT. And I also have no issue with a user deciding to give a WebSRT file a go by renaming it to .srt. By keeping WebSRT and SRT as different formats we give the applications a choice to support either, or both in the same parser. If we don't, we force them to deal in a single parser with all the oddities of SRT formats as well as all the extra features and all the extensibility of WebSRT. Why wouldn't it always be a superior solution for all parties to do the following: 1) Make sure WebSRT never requires processing that'd require rendering a substantial body of legacy .srt content in a broken way. (This would require supporting non-UTF-8 encodings by sniffing as well as supporting font and u, which would happen for free if my innerHTML proposal were adopted.) 2) Make playback software that supports WebSRT only have a WebSRT code path and use that code path for legacy .srt content as well. ? Specifically, if #1 is done, why would any pragmatic developer not want to do #2 if they are supporting WebSRT in their software? Why would anyone want to have a code path that turns off new WebSRT features if they have a code path that supports WebSRT features? Or is #1 *impossible* due to the craziness of the legacy? (I thought any given .srt consumer only has a single code path and implemetation-wise there aren't already multiple .srt format even though doom9 spec-wise there are at least two.) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Thu, 26 Aug 2010 09:58:29 +0200, Henri Sivonen hsivo...@iki.fi wrote: Silvia Pfeiffer wrote: You misunderstand my intent. I am by no means suggesting that no WebSRT content is treated as SRT by any application. All I am asking for is a different file extension and a different mime type and possibly a magic identifier such that *authoring* applications (and authors) can clearly designate this to be a different format, in particular if they include new features. Then a *playback application* has the chance to identify them as a different format and provide a specific parser for it, instead of failing like Totem. They can also decide to extend their existing SRT parser to support both WebSRT and SRT. And I also have no issue with a user deciding to give a WebSRT file a go by renaming it to .srt. By keeping WebSRT and SRT as different formats we give the applications a choice to support either, or both in the same parser. If we don't, we force them to deal in a single parser with all the oddities of SRT formats as well as all the extra features and all the extensibility of WebSRT. Why wouldn't it always be a superior solution for all parties to do the following: 1) Make sure WebSRT never requires processing that'd require rendering a substantial body of legacy .srt content in a broken way. (This would require supporting non-UTF-8 encodings by sniffing as well as supporting font and u, which would happen for free if my innerHTML proposal were adopted.) 2) Make playback software that supports WebSRT only have a WebSRT code path and use that code path for legacy .srt content as well. ? Specifically, if #1 is done, why would any pragmatic developer not want to do #2 if they are supporting WebSRT in their software? Why would anyone want to have a code path that turns off new WebSRT features if they have a code path that supports WebSRT features? I think many media player developers would be hesitant to include a full HTML parser just for parsing (Web)SRT, especially since they'd also need a layout engine to get anything more than they would get from a simpler parser. I do think it's a good idea to make the WebSRT handle existing SRT content as well as possible. The encoding issue is easy to side-step by just saying that that's a preprocessing step. Or is #1 *impossible* due to the craziness of the legacy? (I thought any given .srt consumer only has a single code path and implemetation-wise there aren't already multiple .srt format even though doom9 spec-wise there are at least two.) There are some issues with the current WebSRT parser that I've been meaning to send mail about, but by my impression is that it's not impossible to define a parser which works well enough to replace existing ones. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
Why wouldn't it always be a superior solution for all parties to do the following: 1) Make sure WebSRT never requires processing that'd require rendering a substantial body of legacy .srt content in a broken way. (This would require supporting non-UTF-8 encodings by sniffing as well as supporting font and u, which would happen for free if my innerHTML proposal were adopted.) 2) Make playback software that supports WebSRT only have a WebSRT code path and use that code path for legacy .srt content as well. ? Specifically, if #1 is done, why would any pragmatic developer not want to do #2 if they are supporting WebSRT in their software? Why would anyone want to have a code path that turns off new WebSRT features if they have a code path that supports WebSRT features? I think many media player developers would be hesitant to include a full HTML parser just for parsing (Web)SRT, especially since they'd also need a layout engine to get anything more than they would get from a simpler parser. If their app can ingest both WebSRT and legacy SRT (with WebSRT ingested by whatever potentially spec-incompliant means), why would they not use the same ingest code path for both? If the app isn't capable of supporting any feature that's permitted in WebSRT but not part of legacy SRT, how does failing at the point of finding out that this file claims to be WebSRT rather than SRT make things much better than failing at I found stuff that I can't handle/skip over in this SRT file? In particular, it seems like a wrong optimization to make it possible for apps that don't support any WebSRT features over legacy features to fail early than to make apps that support at least one WebSRT-introduced feature unify their processing of WebSRT and SRT by processing both WebSRT and SRT as one format where legacy SRT files just don't happen to use new features. To me, having different code paths for WebSRT and SRT is like IE adding a new Trident snapshot with every release whereas supporting SRT by treating it as WebSRT with no new features (if the app is supporting even one WebSRT-introduced feature!) is like what the other browsers are doing with HTML/CSS/DOM. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, 25 Aug 2010 17:40:08 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: At this point, what is your recommendation? The following ideas have been on the table: * Change the file extension to something other than .srt. I don't have an opinion, browsers ignore the file extension anyway. Yes, I think we should definitely have a new file extension. I'll leave this to others to decide, but since browsers have no concept of file extensions, just using .srt will work. If the format is SRT-like it's likely at least some files will use .srt in practice. All SRT files in practice use the .srt extension - it is typically how these formats are identified by applications. Just because *nix ignores file extensions mostly for identifying file types doesn't mean that applications do. Again, I believe strongly that re-using the same file extension is the one biggest pain we can inflict on the community. As shown above, several popular (?) media players ignore or give little weight to the file extension. I don't think that's a fair sample - as I said, on Linux and on the command-line things are different. I have a GUI mplayer here and it reacts like VLC - doesn't let me open .wsrt files. The vast majority of applications on Windows and the Mac make their decision on whether they support files based on the file extension. That the file selection dialogs are filtered by file extensions doesn't mean that applications don't sniff the content. In fact, MPlayer, VLC and Totem will happily load and use an SRT file even if it is called foo.smi, even though SAMI is a completely incompatible format. In other words, they sniff the content as being SRT. The reason that they rely on sniffing is likely that many files use the wrong file extension (my OpenSubtitles batch have no extensions, so I have no statistics on this). Again, if we want to avoid exposing existing SRT parsers to WebSRT syntax, then the format needs to be more incompatible. File extensions will be changed, popular players rely on sniffing, some ignore leading garbage and also headers can simply be removed by naive conversion tools. Assuming we pick the same file extension and we now have a new application that only supports WebSRT parsing, we will make a large bunch of existing valid SRT files invalid - not only those that are not in UTF-8, but also those with font../font and u.../u. I do wonder if the text between the font start and end element and inside the u../u may even get removed because of lack of support for these. I've seen no application that removes everything between tags it doesn't recognize, the only things that I've seen happen is treating it as plain text or ignoring the tags much like a browser does with HTML. * Add a header to WebSRT to make it uniquely identifiable. The header would have to be mandatory and browsers would have to reject files that don't have it. Such files would be compatible with some existing software and break some, depending on how they sniff. We could also put metadata in such a header. Yes, I think we need to introduce a header. Maybe we can hide all the structure in what SRT recognizes as comments (i.e. start the lines as ;. But I believe we need some hints like the @profile to identify the type of the cues and the link to link to a style sheet, and we need metadata like the meta element of HTML headers. I had no idea that semicolon was used for comments in SRT, is this usage widespread? Does it work in most players? I thought it was, but maybe it was just introduced for WebSRT. It is not tested in Hixie's SRT research[2]. Can you take a quick look through your SRT file collection if there are any? I'm probably wrong about this seeing as it's not mentioned in the wiki page for SRT [3]. [2] http://wiki.whatwg.org/wiki/SRT_research [3] http://en.wikipedia.org/wiki/SubRip OK, I grepped the 1 files. Only 15 had any lines beginning with a semicolon, and by manual inspection it doesn't look like any of them are clearly intended as comments (it's hard to tell, all are in foreign languages). None of them were at the very beginning of the file. Ah, that actually makes for another incompatibility of WebSRT and SRT: such lines are regarded as comments in WebSRT when they probably aren't in SRT. I can't find anything about this when searching for comment and semicolon in the spec, are you sure you're not thinking of some other format than WebSRT? It seems increasingly that the only thing that WebSRT and SRT still have in common is the -- character sequence. As a friend of mine in a11y recently said: I was hoping to never have to stare at -- ever again... We could indeed go all the way and define an much more different format, though I don't think it will create implementations as quickly as a SRT-based but changed format. I would prefer if we follow one of two paths: 1. Let
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Thu, 26 Aug 2010 11:52:26 +0200, Henri Sivonen hsivo...@iki.fi wrote: Why wouldn't it always be a superior solution for all parties to do the following: 1) Make sure WebSRT never requires processing that'd require rendering a substantial body of legacy .srt content in a broken way. (This would require supporting non-UTF-8 encodings by sniffing as well as supporting font and u, which would happen for free if my innerHTML proposal were adopted.) 2) Make playback software that supports WebSRT only have a WebSRT code path and use that code path for legacy .srt content as well. ? Specifically, if #1 is done, why would any pragmatic developer not want to do #2 if they are supporting WebSRT in their software? Why would anyone want to have a code path that turns off new WebSRT features if they have a code path that supports WebSRT features? I think many media player developers would be hesitant to include a full HTML parser just for parsing (Web)SRT, especially since they'd also need a layout engine to get anything more than they would get from a simpler parser. If their app can ingest both WebSRT and legacy SRT (with WebSRT ingested by whatever potentially spec-incompliant means), why would they not use the same ingest code path for both? I don't they should or would, I'm just saying that they'd probably be hesitant to use an HTML parser in that single code path, as there's very little benefit for them. If the app isn't capable of supporting any feature that's permitted in WebSRT but not part of legacy SRT, how does failing at the point of finding out that this file claims to be WebSRT rather than SRT make things much better than failing at I found stuff that I can't handle/skip over in this SRT file? In particular, it seems like a wrong optimization to make it possible for apps that don't support any WebSRT features over legacy features to fail early than to make apps that support at least one WebSRT-introduced feature unify their processing of WebSRT and SRT by processing both WebSRT and SRT as one format where legacy SRT files just don't happen to use new features. To me, having different code paths for WebSRT and SRT is like IE adding a new Trident snapshot with every release whereas supporting SRT by treating it as WebSRT with no new features (if the app is supporting even one WebSRT-introduced feature!) is like what the other browsers are doing with HTML/CSS/DOM. Is this in reply to something other than what you quoted? In any case, I agree. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Tue, Aug 24, 2010 at 8:49 PM, Philip Jägenstedt phil...@opera.comwrote: On Tue, 24 Aug 2010 04:32:21 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Mon, Aug 23, 2010 at 6:55 PM, Philip Jägenstedt phil...@opera.com wrote: Aside: WebSRT can't contain binary data, only UTF-8 encoded text. It sure can. Just base-64 encode it. I'm not saying it's a good thing, but if somebody really has an urge... Sure, this would be a metadata track. Sites have no reason to offer download links to it, and if anyone gets hold of such a file it would quickly be evident that it's useless. After a user has seen the crap on screen. I'm just saying: it's a legal WebSRT file and really not compatible with any existing infrastructure for SRT. A fair point. The alternatives I can see are (1) using an incompatible format so that the user sees nothing or (2) adding a header that indicates that the track is metadata. In order to tell the user to stop wasting their time with this file, I think (1) is clearly worse. (2) is absolutely an option, but it will only make a difference to software that understands this header and if the header is optional it will likely often be omitted. A dialog saying this is a metadata track, you can't watch it is slightly friendlier than a screen full of crap, but they are both pretty effective at getting the message across. Yeah, I'm totally for adding a hint as to what format is in the cue. Then, a WebSRT file can be identified as to what it contains. If we define WebSRT in a way that can handle 99% of existing content and degrade gracefully (enough) when using new features in old software, it seems reasonable to do. If lots of software developers cry foul, then perhaps we should reconsider. It seems to me, though, that actually researching and defining a good algorithm for parsing SRT would be of use to others than just browsers. How is that different from moving away from SRT. If everyone has to change their parsing of SRT to accommodate a new spec, then that is a new format. Not everyone has to change their parsers immediately, many will continue to work. However, if someone wants to support SRT in a compatible way, it's very helpful to have a spec, assuming that WebSRT is actually compatible enough with existing SRT content. This is quite similar to HTML4 vs HTML5. There are lots of mostly compatible HTML parsers, but HTML5 defines a single parsing algorithm, and slow convergence towards that is a good thing. No, no, no! It is not at all similar to HTML4 and HTML5. A Web browser cannot suddenly stop working for a Web page, just because it has some extra functionality in it. Thus, the HTML format has been developed such that it can be extended without breaking existing stuff. We can guarantee that no browser will break because that is the way in which the format has been specified. No such thing has happened for SRT and there is simply no way to guarantee that all new WebSRT files will work in all existing SRT software, because SRT has not been specified as a extensible format and because there is no agreement between all parties that have implemented SRT support as to how extensions should be made. We can introduce such a thing for WebSRT, but we cannot claim it for SRT. You are right, existing SRT parsers are probably far less interoperable than HTML parsers were before HTML5. Existing content demands that SRT parsers handle at least i, b, font and u in some manner, even if it is by ignoring it. Any parsers that treat SRT as plain text don't even work with todays content, so I don't think they should be considered at all. You've just defined what SRT is. I would actually define SRT as the plain text format and the i, b, font and u markup as extensions. The question, then, is if parsers that handle the mentioned markup also ignore 1, ruby and rt. I haven't tested it, but I assume that some will ignore it and some won't. How many percent of the media player market would have to handle this correctly for these extensions to be OK, in your opinion? If a single one breaks, it would be bad IMO because the expectations of the users of that software will be broken even if it may just be a small percentage of users and we have no influence on the upgrade path of that software - in particular if it is proprietary. If the SRT ecosystem is so fragile that it cannot tolerate any extension whatsoever, then we should stay far away from it. It just seems that's not the case. How do we know that everyone that uses SRT now really wants to use WebSRT instead and wants to take part in the new ecosystem that we are introducing? We make some pretty big assumptions about what everyone who is not a Web browser vendor wants to do with SRT. That doesn't make the existing SRT ecosystem fragile - but it makes it an existing environment that needs to be respected. At this point,
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, 25 Aug 2010 09:16:56 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Tue, Aug 24, 2010 at 8:49 PM, Philip Jägenstedt phil...@opera.comwrote: On Tue, 24 Aug 2010 04:32:21 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Mon, Aug 23, 2010 at 6:55 PM, Philip Jägenstedt phil...@opera.com wrote: Aside: WebSRT can't contain binary data, only UTF-8 encoded text. It sure can. Just base-64 encode it. I'm not saying it's a good thing, but if somebody really has an urge... Sure, this would be a metadata track. Sites have no reason to offer download links to it, and if anyone gets hold of such a file it would quickly be evident that it's useless. After a user has seen the crap on screen. I'm just saying: it's a legal WebSRT file and really not compatible with any existing infrastructure for SRT. A fair point. The alternatives I can see are (1) using an incompatible format so that the user sees nothing or (2) adding a header that indicates that the track is metadata. In order to tell the user to stop wasting their time with this file, I think (1) is clearly worse. (2) is absolutely an option, but it will only make a difference to software that understands this header and if the header is optional it will likely often be omitted. A dialog saying this is a metadata track, you can't watch it is slightly friendlier than a screen full of crap, but they are both pretty effective at getting the message across. Yeah, I'm totally for adding a hint as to what format is in the cue. Then, a WebSRT file can be identified as to what it contains. OK, but note that a browser would ignore this and trust what track kind says. I wouldn't want the kind change after the external track is loaded, it would make the UI confusing if a captions track disappeared from the menu as soon as it was loaded because it internally claims to be metadata. If we define WebSRT in a way that can handle 99% of existing content and degrade gracefully (enough) when using new features in old software, it seems reasonable to do. If lots of software developers cry foul, then perhaps we should reconsider. It seems to me, though, that actually researching and defining a good algorithm for parsing SRT would be of use to others than just browsers. How is that different from moving away from SRT. If everyone has to change their parsing of SRT to accommodate a new spec, then that is a new format. Not everyone has to change their parsers immediately, many will continue to work. However, if someone wants to support SRT in a compatible way, it's very helpful to have a spec, assuming that WebSRT is actually compatible enough with existing SRT content. This is quite similar to HTML4 vs HTML5. There are lots of mostly compatible HTML parsers, but HTML5 defines a single parsing algorithm, and slow convergence towards that is a good thing. No, no, no! It is not at all similar to HTML4 and HTML5. A Web browser cannot suddenly stop working for a Web page, just because it has some extra functionality in it. Thus, the HTML format has been developed such that it can be extended without breaking existing stuff. We can guarantee that no browser will break because that is the way in which the format has been specified. No such thing has happened for SRT and there is simply no way to guarantee that all new WebSRT files will work in all existing SRT software, because SRT has not been specified as a extensible format and because there is no agreement between all parties that have implemented SRT support as to how extensions should be made. We can introduce such a thing for WebSRT, but we cannot claim it for SRT. You are right, existing SRT parsers are probably far less interoperable than HTML parsers were before HTML5. Existing content demands that SRT parsers handle at least i, b, font and u in some manner, even if it is by ignoring it. Any parsers that treat SRT as plain text don't even work with todays content, so I don't think they should be considered at all. You've just defined what SRT is. I would actually define SRT as the plain text format and the i, b, font and u markup as extensions. Perhaps SRT was originally plain text, but for a very long time now, files with the .srt extension contain markup, more than 50% do in the OpenSubtitles sample data. With nothing to differentiate the plain text and markup formats, there is effectively only one format, no matter what we choose to call it. The question, then, is if parsers that handle the mentioned markup also ignore 1, ruby and rt. I haven't tested it, but I assume that some will ignore it and some won't. How many percent of the media player market would have to handle this correctly for these extensions to be OK, in your opinion? If a single one breaks, it would be bad IMO because the expectations of the users of that software will be broken even if
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, Aug 25, 2010 at 7:20 PM, Philip Jägenstedt phil...@opera.comwrote: On Wed, 25 Aug 2010 09:16:56 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Tue, Aug 24, 2010 at 8:49 PM, Philip Jägenstedt phil...@opera.com wrote: On Tue, 24 Aug 2010 04:32:21 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Mon, Aug 23, 2010 at 6:55 PM, Philip Jägenstedt phil...@opera.com wrote: Aside: WebSRT can't contain binary data, only UTF-8 encoded text. It sure can. Just base-64 encode it. I'm not saying it's a good thing, but if somebody really has an urge... Sure, this would be a metadata track. Sites have no reason to offer download links to it, and if anyone gets hold of such a file it would quickly be evident that it's useless. After a user has seen the crap on screen. I'm just saying: it's a legal WebSRT file and really not compatible with any existing infrastructure for SRT. A fair point. The alternatives I can see are (1) using an incompatible format so that the user sees nothing or (2) adding a header that indicates that the track is metadata. In order to tell the user to stop wasting their time with this file, I think (1) is clearly worse. (2) is absolutely an option, but it will only make a difference to software that understands this header and if the header is optional it will likely often be omitted. A dialog saying this is a metadata track, you can't watch it is slightly friendlier than a screen full of crap, but they are both pretty effective at getting the message across. Yeah, I'm totally for adding a hint as to what format is in the cue. Then, a WebSRT file can be identified as to what it contains. OK, but note that a browser would ignore this and trust what track kind says. I wouldn't want the kind change after the external track is loaded, it would make the UI confusing if a captions track disappeared from the menu as soon as it was loaded because it internally claims to be metadata. Yes, I have no problem with that. Though I believe we have overloaded @kind with too much meaning as I already mentioned earlier [1]. I think it would make more sense to pull the different dimensions into different attributes: - @type or @format for the format of the cue - @kind for the semantic meaning of it (subtitle, caption, karaoke etc) - one track could even satisfy several needs, so this would be a lit of kinds - and finally the visual rendering problem, which could possibly be solved by providing a link to a div or p where the data should be rendered alternatively to the default. Right now, audio and metadata tracks get no rendering at all and I see that as a problem. [1] http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-July/027356.html The question, then, is if parsers that handle the mentioned markup also ignore 1, ruby and rt. I haven't tested it, but I assume that some will ignore it and some won't. How many percent of the media player market would have to handle this correctly for these extensions to be OK, in your opinion? If a single one breaks, it would be bad IMO because the expectations of the users of that software will be broken even if it may just be a small percentage of users and we have no influence on the upgrade path of that software - in particular if it is proprietary. Neither a new file extension, MIME type or header is enough to stop some implementations from treating it as SRT and break. The only remaining option, AFAICT, is making the format fundamentally incompatible with SRT. Is it worth it? If it has a different file extension and a different mime type and even a different header, I don't think any existing software will open it as SRT. Why would it think that a random file is a SRT file? It would need to be an application that accepts absolutely anything that you give it as SRT and then that software has more fundamental problems. At this point, what is your recommendation? The following ideas have been on the table: * Change the file extension to something other than .srt. I don't have an opinion, browsers ignore the file extension anyway. Yes, I think we should definitely have a new file extension. I'll leave this to others to decide, but since browsers have no concept of file extensions, just using .srt will work. If the format is SRT-like it's likely at least some files will use .srt in practice. All SRT files in practice use the .srt extension - it is typically how these formats are identified by applications. Just because *nix ignores file extensions mostly for identifying file types doesn't mean that applications do. Again, I believe strongly that re-using the same file extension is the one biggest pain we can inflict on the community. * Change the MIME type to something other than text/srt. I doubt it makes any difference, as most software that deal with SRT today have no concept of MIME types. No matter what I'd want
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, 25 Aug 2010 14:39:00 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Wed, Aug 25, 2010 at 7:20 PM, Philip Jägenstedt phil...@opera.comwrote: The question, then, is if parsers that handle the mentioned markup also ignore 1, ruby and rt. I haven't tested it, but I assume that some will ignore it and some won't. How many percent of the media player market would have to handle this correctly for these extensions to be OK, in your opinion? If a single one breaks, it would be bad IMO because the expectations of the users of that software will be broken even if it may just be a small percentage of users and we have no influence on the upgrade path of that software - in particular if it is proprietary. Neither a new file extension, MIME type or header is enough to stop some implementations from treating it as SRT and break. The only remaining option, AFAICT, is making the format fundamentally incompatible with SRT. Is it worth it? If it has a different file extension and a different mime type and even a different header, I don't think any existing software will open it as SRT. Why would it think that a random file is a SRT file? It would need to be an application that accepts absolutely anything that you give it as SRT and then that software has more fundamental problems. I renamed a SRT file to .wsrt and added WEBSRT on a line before the cues and it still plays just fine in MPlayer, using `mplayer video.ogv -sub subs.wsrt`. VLC won't open a subtitle file with .wsrt extension, but the same file (with a WEBSRT header) works with the extension srt or txt. Totem is the other way around, the file extension doesn't matter, but it rejects files with a header. The results are hardly consistent, but at least one player exist for which it's not enough to change the file extension and add a header. If we want to make sure that no content is treated as SRT by any application, the format must be more incompatible. At this point, what is your recommendation? The following ideas have been on the table: * Change the file extension to something other than .srt. I don't have an opinion, browsers ignore the file extension anyway. Yes, I think we should definitely have a new file extension. I'll leave this to others to decide, but since browsers have no concept of file extensions, just using .srt will work. If the format is SRT-like it's likely at least some files will use .srt in practice. All SRT files in practice use the .srt extension - it is typically how these formats are identified by applications. Just because *nix ignores file extensions mostly for identifying file types doesn't mean that applications do. Again, I believe strongly that re-using the same file extension is the one biggest pain we can inflict on the community. As shown above, several popular (?) media players ignore or give little weight to the file extension. * Change the MIME type to something other than text/srt. I doubt it makes any difference, as most software that deal with SRT today have no concept of MIME types. No matter what I'd want exactly 1 MIME type or alternatively make browsers ignore the MIME type completely. You're right in that existing SRT software probably doesn't deal much with a SRT mime type. Right now text/x-srt or text/srt is sometimes used for SRT files, but often text/plain is also in use and more likely from a Web server. Since this is the space where Web browsers play, I am not overly fussed, though I think logically text/websrt makes more sense with a .wsrt extension. Then, also SRT files can be served as text/websrt to allow them to take part in the WebSRT infrastructure if indeed they will continue to be valid WebSRT files. Is there anything you expect would break if WebSRT files were served as text/srt? I'm asking because I don't know how anal Web browsers are about mime types. I would think a Web browser should accept WebSRT and SRT files in text/plain format as well as WebSRT files in text/websrt format and SRT files in text/srt format. Would something break if they even came as text/html? I would expect that it makes a difference when these are loaded directly as a resource for display (e.g. when you directly go to http://example.com/mycaptions.wsrt), but not when used through a track element, where WebSRT is the baseline format and thus is expected. It's actually easier for a browser to ignore the MIME type than it is to be strict about it, at least when the format is easily identified by sniffing (sniffing code is needed anyway for local files). WebSRT isn't very easy to sniff, so that would be an argument in favor of a mandatory magic header. The main reason to care about the MIME type is some kind of doing the right thing by not letting people get away with misconfigured servers. Sometimes I feel it's just a waste of everyone's time though, it would generally
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Thu, Aug 26, 2010 at 12:39 AM, Philip Jägenstedt phil...@opera.comwrote: On Wed, 25 Aug 2010 14:39:00 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Wed, Aug 25, 2010 at 7:20 PM, Philip Jägenstedt phil...@opera.com wrote: The question, then, is if parsers that handle the mentioned markup also ignore 1, ruby and rt. I haven't tested it, but I assume that some will ignore it and some won't. How many percent of the media player market would have to handle this correctly for these extensions to be OK, in your opinion? If a single one breaks, it would be bad IMO because the expectations of the users of that software will be broken even if it may just be a small percentage of users and we have no influence on the upgrade path of that software - in particular if it is proprietary. Neither a new file extension, MIME type or header is enough to stop some implementations from treating it as SRT and break. The only remaining option, AFAICT, is making the format fundamentally incompatible with SRT. Is it worth it? If it has a different file extension and a different mime type and even a different header, I don't think any existing software will open it as SRT. Why would it think that a random file is a SRT file? It would need to be an application that accepts absolutely anything that you give it as SRT and then that software has more fundamental problems. I renamed a SRT file to .wsrt and added WEBSRT on a line before the cues and it still plays just fine in MPlayer, using `mplayer video.ogv -sub subs.wsrt`. I wouldn't count command-line applications for this - you can always throw just about anything at a command-line application and that is good and an advantage, because it may just work, as it did here. But it is a controlled environment by somebody who knows what they are doing - it is unlikely to cause problems and confusion. VLC won't open a subtitle file with .wsrt extension, but the same file (with a WEBSRT header) works with the extension srt or txt. Again - that's a good thing and exactly what I would prefer. If you know what you are doing and you know your file is probably just going to work, you can consciously decide to fall back to SRT. Totem is the other way around, the file extension doesn't matter, but it rejects files with a header. That's just proof that it's a different file format. The results are hardly consistent, but at least one player exist for which it's not enough to change the file extension and add a header. If we want to make sure that no content is treated as SRT by any application, the format must be more incompatible. You misunderstand my intent. I am by no means suggesting that no WebSRT content is treated as SRT by any application. All I am asking for is a different file extension and a different mime type and possibly a magic identifier such that *authoring* applications (and authors) can clearly designate this to be a different format, in particular if they include new features. Then a *playback application* has the chance to identify them as a different format and provide a specific parser for it, instead of failing like Totem. They can also decide to extend their existing SRT parser to support both WebSRT and SRT. And I also have no issue with a user deciding to give a WebSRT file a go by renaming it to .srt. By keeping WebSRT and SRT as different formats we give the applications a choice to support either, or both in the same parser. If we don't, we force them to deal in a single parser with all the oddities of SRT formats as well as all the extra features and all the extensibility of WebSRT. At this point, what is your recommendation? The following ideas have been on the table: * Change the file extension to something other than .srt. I don't have an opinion, browsers ignore the file extension anyway. Yes, I think we should definitely have a new file extension. I'll leave this to others to decide, but since browsers have no concept of file extensions, just using .srt will work. If the format is SRT-like it's likely at least some files will use .srt in practice. All SRT files in practice use the .srt extension - it is typically how these formats are identified by applications. Just because *nix ignores file extensions mostly for identifying file types doesn't mean that applications do. Again, I believe strongly that re-using the same file extension is the one biggest pain we can inflict on the community. As shown above, several popular (?) media players ignore or give little weight to the file extension. I don't think that's a fair sample - as I said, on Linux and on the command-line things are different. I have a GUI mplayer here and it reacts like VLC - doesn't let me open .wsrt files. The vast majority of applications on Windows and the Mac make their decision on whether they support files based on the file extension. Assuming we pick the same
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Aug 25, 2010, at 8:40 AM, Silvia Pfeiffer wrote: On Thu, Aug 26, 2010 at 12:39 AM, Philip Jägenstedt phil...@opera.com wrote: The results are hardly consistent, but at least one player exist for which it's not enough to change the file extension and add a header. If we want to make sure that no content is treated as SRT by any application, the format must be more incompatible. You misunderstand my intent. I am by no means suggesting that no WebSRT content is treated as SRT by any application. All I am asking for is a different file extension and a different mime type and possibly a magic identifier such that *authoring* applications (and authors) can clearly designate this to be a different format, in particular if they include new features. Then a *playback application* has the chance to identify them as a different format and provide a specific parser for it, instead of failing like Totem. They can also decide to extend their existing SRT parser to support both WebSRT and SRT. And I also have no issue with a user deciding to give a WebSRT file a go by renaming it to .srt. By keeping WebSRT and SRT as different formats we give the applications a choice to support either, or both in the same parser. If we don't, we force them to deal in a single parser with all the oddities of SRT formats as well as all the extra features and all the extensibility of WebSRT. I think we've made some interesting finds in this thread, but we're starting to go in circles by now. Perhaps we should give it a rest until we get input from a third party. A medal to anyone who has followed it this far :) FWIW, I agree with Silvia that a new file extension and MIME type make sense. Keeping them the same won't help applications that don't know about WebSRT, they will try to play the files and aren't likely to deal with the differences gracefully. Keeping them the same also won't help new applications that know about WebSRT, it won't make any difference if there is one MIME type or two. eric
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Thu, Aug 26, 2010 at 2:39 AM, Philip Jägenstedt phil...@opera.com wrote: It's actually easier for a browser to ignore the MIME type than it is to be strict about it, at least when the format is easily identified by sniffing (sniffing code is needed anyway for local files). Firefox (in the case of video) uses file extensions to identify video files. We have an internal maping of file extensions to mime types. We don't sniff the content. I imagine we'd do the same with whatever file extension is used for WebSRT. Chris. -- http://www.bluishcoder.co.nz
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Thu, Aug 26, 2010 at 2:39 AM, Philip Jägenstedt phil...@opera.com wrote: The main reason to care about the MIME type is some kind of doing the right thing by not letting people get away with misconfigured servers. Sometimes I feel it's just a waste of everyone's time though, it would generally be less work for both browsers and authors to not bother. I disagree that this is the main reason. I was a web developer before being a browser developer and I can say it was highly annoying dealing with browsers that sniff content types. There were times where we wanted to send a file as plain text or binary data but the browser would sniff it and attempt to handle it. Chris. -- http://www.bluishcoder.co.nz
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Thu, Aug 26, 2010 at 5:25 AM, Eric Carlson eric.carl...@apple.com wrote: FWIW, I agree with Silvia that a new file extension and MIME type make sense. I also think that a new file extension and MIME type is the way to go. Chris. -- http://www.bluishcoder.co.nz
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Tue, 24 Aug 2010 04:32:21 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Mon, Aug 23, 2010 at 6:55 PM, Philip Jägenstedt phil...@opera.comwrote: Aside: WebSRT can't contain binary data, only UTF-8 encoded text. It sure can. Just base-64 encode it. I'm not saying it's a good thing, but if somebody really has an urge... Sure, this would be a metadata track. Sites have no reason to offer download links to it, and if anyone gets hold of such a file it would quickly be evident that it's useless. After a user has seen the crap on screen. I'm just saying: it's a legal WebSRT file and really not compatible with any existing infrastructure for SRT. A fair point. The alternatives I can see are (1) using an incompatible format so that the user sees nothing or (2) adding a header that indicates that the track is metadata. In order to tell the user to stop wasting their time with this file, I think (1) is clearly worse. (2) is absolutely an option, but it will only make a difference to software that understands this header and if the header is optional it will likely often be omitted. A dialog saying this is a metadata track, you can't watch it is slightly friendlier than a screen full of crap, but they are both pretty effective at getting the message across. If we define WebSRT in a way that can handle 99% of existing content and degrade gracefully (enough) when using new features in old software, it seems reasonable to do. If lots of software developers cry foul, then perhaps we should reconsider. It seems to me, though, that actually researching and defining a good algorithm for parsing SRT would be of use to others than just browsers. How is that different from moving away from SRT. If everyone has to change their parsing of SRT to accommodate a new spec, then that is a new format. Not everyone has to change their parsers immediately, many will continue to work. However, if someone wants to support SRT in a compatible way, it's very helpful to have a spec, assuming that WebSRT is actually compatible enough with existing SRT content. This is quite similar to HTML4 vs HTML5. There are lots of mostly compatible HTML parsers, but HTML5 defines a single parsing algorithm, and slow convergence towards that is a good thing. No, no, no! It is not at all similar to HTML4 and HTML5. A Web browser cannot suddenly stop working for a Web page, just because it has some extra functionality in it. Thus, the HTML format has been developed such that it can be extended without breaking existing stuff. We can guarantee that no browser will break because that is the way in which the format has been specified. No such thing has happened for SRT and there is simply no way to guarantee that all new WebSRT files will work in all existing SRT software, because SRT has not been specified as a extensible format and because there is no agreement between all parties that have implemented SRT support as to how extensions should be made. We can introduce such a thing for WebSRT, but we cannot claim it for SRT. You are right, existing SRT parsers are probably far less interoperable than HTML parsers were before HTML5. Existing content demands that SRT parsers handle at least i, b, font and u in some manner, even if it is by ignoring it. Any parsers that treat SRT as plain text don't even work with todays content, so I don't think they should be considered at all. The question, then, is if parsers that handle the mentioned markup also ignore 1, ruby and rt. I haven't tested it, but I assume that some will ignore it and some won't. How many percent of the media player market would have to handle this correctly for these extensions to be OK, in your opinion? If the SRT ecosystem is so fragile that it cannot tolerate any extension whatsoever, then we should stay far away from it. It just seems that's not the case. How do we know that everyone that uses SRT now really wants to use WebSRT instead and wants to take part in the new ecosystem that we are introducing? We make some pretty big assumptions about what everyone who is not a Web browser vendor wants to do with SRT. That doesn't make the existing SRT ecosystem fragile - but it makes it an existing environment that needs to be respected. At this point, what is your recommendation? The following ideas have been on the table: * Change the file extension to something other than .srt. I don't have an opinion, browsers ignore the file extension anyway. * Change the MIME type to something other than text/srt. I doubt it makes any difference, as most software that deal with SRT today have no concept of MIME types. No matter what I'd want exactly 1 MIME type or alternatively make browsers ignore the MIME type completely. * Add a header to WebSRT to make it uniquely identifiable. The header would have to be mandatory and browsers would have to reject
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Sat, 21 Aug 2010 01:32:49 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Fri, Aug 20, 2010 at 10:53 PM, Philip Jägenstedt phil...@opera.comwrote: On Wed, 18 Aug 2010 00:42:04 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Thu, Aug 12, 2010 at 6:09 PM, Philip Jägenstedt phil...@opera.com wrote: Yeah, so the only conforming solution is probably to use CSS3 transition-delay property. That may not be the most elegant solution, but it works. So, it seems clear that in order to use an HTML parser we have to sacrifice some features or make them more verbose. That sounds like there are multiple problems, when in fact we are only talking about the single use case of timestamps. I was referring also to the voices markup which is made much more verbose. All other requirements are met by the existing innerHTML parser. Is it really necessary to throw out all the advantages of re-using innerHTML just to avoid some extra markup for this single use case? No, this isn't a critical use case in itself. I'm not fundamentally opposed to using an HTML parser, I just don't see any great benefits, but some complications. The whole of the WebSRT parser isn't very big or complicated, so I don't think implementation cost is a strong argument for reusing the HTML parser, especially since at least the timing syntax needs a separate parser. It's not just about implementation cost - it's also the problem of maintaining another spec that can grow to have eventually all the features that HTML5 has and more. Do you really eventually want to re-spec and re-implement a whole innerHTML parser plus the extra t element when we start putting svg and canvas and all sorts of other more complex HTML features into captions? Just because the t element is making trouble now? Is this really the time to re-invent HTML? I don't expect that SVG, canvas, images, etc will ever natively be made part of captions. Rather, I would hope that the metadata state together with scripts is used. If we think that e.g. images in captions are an important use case, then WebSRT is not a good solution. If we allow arbitrary HTML and expect browsers to handle it well, it adds some complexity. For example, any videos and images in the cue would have to be fully loaded and ready to be decoded by the time the cue is to be shown, which I really don't want to implement the logic for. Simply having an iframe-like container where the document is replaced for each cue wouldn't be enough, rather one would have to create one document per cue during parsing and wait for all of those to finish loading before beginning playback. I'm not sure, but I'm guessing that amounts to significant memory overhead. As an aside, I personally see it as a good things that font *doesn't* work in WebSRT, whereas it would using an HTML parser. It's a bit more than just annoying to users. If there are automated processes involved that print that stuff on tape for example, you can burn through a lot of material and money before realising that your input files are broken and if you cannot get software support for the new files implemented, you may need to implement costly manual checking of the files. SRT as it is today can and does contain broken timestamps, missing linebreaks and at least i, b, u and font ... markup, some of which is broken. If anyone is able to to rely on their input as being well-formed enough as to be put through automatic but costly processes, they'd have to have very good control of where their input comes from. I can't see how WebSRT would change that. I would indeed expect a fairly trusted relationship with the supplier. But assuming your supplier changes from SRT to WebSRT support in their captions. If they have two different file extensions, you will notice immediately and there is a trigger to actually start implementing WebSRT support. If they are the same file extension, that will cause the trouble I explained. If at least there was a version identifier in existing SRT, then we wouldn't have that trouble at all. But we've had this discussion. The core problem is that WebSRT is far too compatible with existing SRT usage. Regardless of the file extension and MIME type used, it's quite improbable that anyone will have different parsers for the same format. Once media players have been forced to handle the extra markup in WebSRT (e.g. by ignoring it, as many already do) the two formats will be the same, and using WebSRT markup in .srt files will just work, so that's what people will do. We may avoid being seen as arrogant format-hijackers, but the end result is two extensions and two different MIME types that mean exactly the same thing. It actually burns down to the question: do we want the simple SRT format to survive as its own format and be something that people can rely upon as not having weird stuff in it - or
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Mon, Aug 23, 2010 at 6:55 PM, Philip Jägenstedt phil...@opera.comwrote: On Sat, 21 Aug 2010 01:32:49 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Fri, Aug 20, 2010 at 10:53 PM, Philip Jägenstedt phil...@opera.com wrote: On Wed, 18 Aug 2010 00:42:04 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Thu, Aug 12, 2010 at 6:09 PM, Philip Jägenstedt phil...@opera.com wrote: Yeah, so the only conforming solution is probably to use CSS3 transition-delay property. That may not be the most elegant solution, but it works. So, it seems clear that in order to use an HTML parser we have to sacrifice some features or make them more verbose. That sounds like there are multiple problems, when in fact we are only talking about the single use case of timestamps. I was referring also to the voices markup which is made much more verbose. Yeah, but that one actually makes sense to be integrated with existing ways that class and CSS work. I don't expect that SVG, canvas, images, etc will ever natively be made part of captions. Rather, I would hope that the metadata state together with scripts is used. If we think that e.g. images in captions are an important use case, then WebSRT is not a good solution. I believe they will be. But since we are only looking at the ways in which captions and subtitles are used currently, we haven't accepted this as an important use case, which is fair enough. I am considering likely future use though, which is always hard to argue. If we allow arbitrary HTML and expect browsers to handle it well, it adds some complexity. For example, any videos and images in the cue would have to be fully loaded and ready to be decoded by the time the cue is to be shown, which I really don't want to implement the logic for. Simply having an iframe-like container where the document is replaced for each cue wouldn't be enough, rather one would have to create one document per cue during parsing and wait for all of those to finish loading before beginning playback. I'm not sure, but I'm guessing that amounts to significant memory overhead. I have to leave that discussion to others, since I don't know enough about how the plumbing used in browsers works together. My expectation was that most of the plumbing with innerHTML already exists and the loading/display of the cue will be in parallel to the video playback, so it won't hold back the main page, even if e.g. a img element cannot be loaded in time. Honestly, using the existing small mess around SRT as an excuse to turn it into a huge mess doesn't seem a good argument to me. I'm just saying that SRT isn't a plain text format today and anyone who's able to assume it is can only do so because they control the input. Deployed SRT uses i, b, font and u. WebSRT adds ruby, rt and 1...infinity, extensions which are very much in line with the existing format and already works in many players (in the sense that they are ignored, not rendered). I wouldn't call that a huge mess. And removes font and u. And adds a whole swag of other functionality, which are not in line with the existing format. Just picking a part of the WebSRT specification that may work if the software was written sanely isn't really a fair argument for compatibility. But anyway, I think at this stage we can only agree to disagree about whether SRT ad WebSRT are compatible. :-) Aside: WebSRT can't contain binary data, only UTF-8 encoded text. It sure can. Just base-64 encode it. I'm not saying it's a good thing, but if somebody really has an urge... Sure, this would be a metadata track. Sites have no reason to offer download links to it, and if anyone gets hold of such a file it would quickly be evident that it's useless. After a user has seen the crap on screen. I'm just saying: it's a legal WebSRT file and really not compatible with any existing infrastructure for SRT. If we define WebSRT in a way that can handle 99% of existing content and degrade gracefully (enough) when using new features in old software, it seems reasonable to do. If lots of software developers cry foul, then perhaps we should reconsider. It seems to me, though, that actually researching and defining a good algorithm for parsing SRT would be of use to others than just browsers. How is that different from moving away from SRT. If everyone has to change their parsing of SRT to accommodate a new spec, then that is a new format. Not everyone has to change their parsers immediately, many will continue to work. However, if someone wants to support SRT in a compatible way, it's very helpful to have a spec, assuming that WebSRT is actually compatible enough with existing SRT content. This is quite similar to HTML4 vs HTML5. There are lots of mostly compatible HTML parsers, but HTML5 defines a single parsing algorithm, and slow convergence towards that is a good thing. No, no, no! It is not at all
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On 24.08.2010 04:32, Silvia Pfeiffer wrote: ... P.S. I do wonder if anyone other than us is still following this thread. ;-) ... I do. It seems that embrace extend is somewhat unfriendly unless the original SRT community is ok with it. If it's not, then make sure that the formats can be distinguished, and that there are distinct media types. Best regards, Julian
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, 18 Aug 2010 00:42:04 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Thu, Aug 12, 2010 at 6:09 PM, Philip Jägenstedt phil...@opera.comwrote: On Thu, 12 Aug 2010 02:11:55 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Thu, Aug 12, 2010 at 1:26 AM, Philip Jägenstedt phil...@opera.com wrote: On Wed, 11 Aug 2010 15:38:32 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Wed, Aug 11, 2010 at 10:30 PM, Philip Jägenstedt phil...@opera.com wrote: On Wed, 11 Aug 2010 01:43:01 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: Going with HTML in the cues, we either have to drop voices and inner timestamps or invent new markup, as HTML can't express either. I don't think either of those are really good solutions, so right now I'm not convinced that reusing the innerHTML parser is a good way forward. I don't see a need for the voices - they already have markup in HTML, see above. But I do wonder about the timestamps. I'd much rather keep the innerHTML parser if we can, but I don't know enough about how the timestamps could be introduced in a non-breakable manner. Maybe with a data- attribute? Maybe span data-t=00:00:02.100.../span? data- attributes are reserved for use by scripts on the same page, but we *could* of course introduce new elements or attributes for this purpose. However, adding features to HTML only for use in WebSRT seems a bit odd. I'd rather avoid adding features to HTML only for WebSRT. Ian turned the timestamps into ProcessingInstructions http://www.whatwg.org/specs/web-apps/current-work/websrt.html#websrt-cue-text-dom-construction-rules . Could we introduce something like ?t at=00:00:02.100? without breaking the innerHTML parser? It appears that the innerHTML parser in at least Opera and Firefox handles PIs in some manner, see test at http://software.hixie.ch/utilities/js/live-dom-viewer/saved/587 Chrome and Safari don't though. However, it isn't valid HTML, validator.nu says Saw ?. Probable cause: Attempt to use an XML processing instruction in HTML. (XML processing instructions are not supported in HTML.) Yeah, so the only conforming solution is probably to use CSS3 transition-delay property. That may not be the most elegant solution, but it works. So, it seems clear that in order to use an HTML parser we have to sacrifice some features or make them more verbose. The whole of the WebSRT parser isn't very big or complicated, so I don't think implementation cost is a strong argument for reusing the HTML parser, especially since at least the timing syntax needs a separate parser. OTOH, if you say that it will take a short time for popular software to start ignoring the extra WebSRT stuff, well, in this case they have implemented WebSRT support in its most basic form and then there is no problem any more anyway. They will then accept the new files and their extensions and mime types and there is explicit support rather than the dodgy question of whether these SRT files will provide crap or not. During a transition period, we will make all software that currently supports SRT become unstable and unreliable. I don't think that's the right way to deal with an existing ecosystem. Coming in as the big brother, claiming their underspecified format, throwing in incompatible features, and saying: just deal with it. It's just not the cavalier thing to do. I agree that it seems (and is) quite selfish, but am not sure the alternatives are any better, see below. About unstable and unreliable, I think there are really only two kind of errors we will see: 1. Some cues being ignored due to trailing settings after the timestamp. Some files may decide at this point that the files are not conformant and fail. 2. Markup being interpreted as plain text. Both already can and do happen with existing use of SRT, which is annoying but better than no subtitles at all. It's a bit more than just annoying to users. If there are automated processes involved that print that stuff on tape for example, you can burn through a lot of material and money before realising that your input files are broken and if you cannot get software support for the new files implemented, you may need to implement costly manual checking of the files. SRT as it is today can and does contain broken timestamps, missing linebreaks and at least i, b, u and font ... markup, some of which is broken. If anyone is able to to rely on their input as being well-formed enough as to be put through automatic but costly processes, they'd have to have very good control of where their input comes from. I can't see how WebSRT would change that. The core problem is that WebSRT is far too compatible with existing SRT usage. Regardless of the file extension and MIME type used, it's quite improbable that anyone will have different parsers for the same format. Once media
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On 18.08.2010 00:43, Silvia Pfeiffer wrote: On Wed, Aug 18, 2010 at 5:12 AM, Julian Reschke julian.resc...@gmx.de mailto:julian.resc...@gmx.de wrote: On 12.08.2010 10:09, Philip Jägenstedt wrote: ... The core problem is that WebSRT is far too compatible with existing SRT usage. Regardless of the file extension and MIME type used, it's quite improbable that anyone will have different parsers for the same format. Once media players have been forced to handle the extra markup in WebSRT (e.g. by ignoring it, as many already do) the two formats will be the same, and using WebSRT markup in .srt files will just work, so that's what people will do. We may avoid being seen as arrogant format-hijackers, but the end result is two extensions and two different MIME types that mean exactly the same thing. ... (just observing...) So when something that used to be plain text now carries markup, what's the compatibility story for plain text that happens to contain markup characters, such as , or ? Best regards, Julian I assume you mean: what happens to text that contains such characters? In most SRT systems, such stuff will just be displayed verbatim. Yes, in SRT. But in WebSRT? Isn't there a compatibility problem when the format just switches from plain text to possibly escaped text? (I recall the problems with title handling in RSS, and I want to make sure that people have considered this issue) Best regards, Julian
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On 12.08.2010 10:09, Philip Jägenstedt wrote: ... The core problem is that WebSRT is far too compatible with existing SRT usage. Regardless of the file extension and MIME type used, it's quite improbable that anyone will have different parsers for the same format. Once media players have been forced to handle the extra markup in WebSRT (e.g. by ignoring it, as many already do) the two formats will be the same, and using WebSRT markup in .srt files will just work, so that's what people will do. We may avoid being seen as arrogant format-hijackers, but the end result is two extensions and two different MIME types that mean exactly the same thing. ... (just observing...) So when something that used to be plain text now carries markup, what's the compatibility story for plain text that happens to contain markup characters, such as , or ? Best regards, Julian
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Thu, Aug 12, 2010 at 6:09 PM, Philip Jägenstedt phil...@opera.comwrote: On Thu, 12 Aug 2010 02:11:55 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Thu, Aug 12, 2010 at 1:26 AM, Philip Jägenstedt phil...@opera.com wrote: On Wed, 11 Aug 2010 15:38:32 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Wed, Aug 11, 2010 at 10:30 PM, Philip Jägenstedt phil...@opera.com wrote: On Wed, 11 Aug 2010 01:43:01 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: Going with HTML in the cues, we either have to drop voices and inner timestamps or invent new markup, as HTML can't express either. I don't think either of those are really good solutions, so right now I'm not convinced that reusing the innerHTML parser is a good way forward. I don't see a need for the voices - they already have markup in HTML, see above. But I do wonder about the timestamps. I'd much rather keep the innerHTML parser if we can, but I don't know enough about how the timestamps could be introduced in a non-breakable manner. Maybe with a data- attribute? Maybe span data-t=00:00:02.100.../span? data- attributes are reserved for use by scripts on the same page, but we *could* of course introduce new elements or attributes for this purpose. However, adding features to HTML only for use in WebSRT seems a bit odd. I'd rather avoid adding features to HTML only for WebSRT. Ian turned the timestamps into ProcessingInstructions http://www.whatwg.org/specs/web-apps/current-work/websrt.html#websrt-cue-text-dom-construction-rules . Could we introduce something like ?t at=00:00:02.100? without breaking the innerHTML parser? It appears that the innerHTML parser in at least Opera and Firefox handles PIs in some manner, see test at http://software.hixie.ch/utilities/js/live-dom-viewer/saved/587 Chrome and Safari don't though. However, it isn't valid HTML, validator.nu says Saw ?. Probable cause: Attempt to use an XML processing instruction in HTML. (XML processing instructions are not supported in HTML.) Yeah, so the only conforming solution is probably to use CSS3 transition-delay property. That may not be the most elegant solution, but it works. OTOH, if you say that it will take a short time for popular software to start ignoring the extra WebSRT stuff, well, in this case they have implemented WebSRT support in its most basic form and then there is no problem any more anyway. They will then accept the new files and their extensions and mime types and there is explicit support rather than the dodgy question of whether these SRT files will provide crap or not. During a transition period, we will make all software that currently supports SRT become unstable and unreliable. I don't think that's the right way to deal with an existing ecosystem. Coming in as the big brother, claiming their underspecified format, throwing in incompatible features, and saying: just deal with it. It's just not the cavalier thing to do. I agree that it seems (and is) quite selfish, but am not sure the alternatives are any better, see below. About unstable and unreliable, I think there are really only two kind of errors we will see: 1. Some cues being ignored due to trailing settings after the timestamp. Some files may decide at this point that the files are not conformant and fail. 2. Markup being interpreted as plain text. Both already can and do happen with existing use of SRT, which is annoying but better than no subtitles at all. It's a bit more than just annoying to users. If there are automated processes involved that print that stuff on tape for example, you can burn through a lot of material and money before realising that your input files are broken and if you cannot get software support for the new files implemented, you may need to implement costly manual checking of the files. The core problem is that WebSRT is far too compatible with existing SRT usage. Regardless of the file extension and MIME type used, it's quite improbable that anyone will have different parsers for the same format. Once media players have been forced to handle the extra markup in WebSRT (e.g. by ignoring it, as many already do) the two formats will be the same, and using WebSRT markup in .srt files will just work, so that's what people will do. We may avoid being seen as arrogant format-hijackers, but the end result is two extensions and two different MIME types that mean exactly the same thing. It actually burns down to the question: do we want the simple SRT format to survive as its own format and be something that people can rely upon as not having weird stuff in it - or do we not. I believe that it's important that it survives. WebSRT can have absolutely anything in it, including code and binary data, even if that stuff would not be interpreted in a browser, but handed on to the JavaScript API for a JavaScript routine to do something with it.
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, Aug 18, 2010 at 5:12 AM, Julian Reschke julian.resc...@gmx.dewrote: On 12.08.2010 10:09, Philip Jägenstedt wrote: ... The core problem is that WebSRT is far too compatible with existing SRT usage. Regardless of the file extension and MIME type used, it's quite improbable that anyone will have different parsers for the same format. Once media players have been forced to handle the extra markup in WebSRT (e.g. by ignoring it, as many already do) the two formats will be the same, and using WebSRT markup in .srt files will just work, so that's what people will do. We may avoid being seen as arrogant format-hijackers, but the end result is two extensions and two different MIME types that mean exactly the same thing. ... (just observing...) So when something that used to be plain text now carries markup, what's the compatibility story for plain text that happens to contain markup characters, such as , or ? Best regards, Julian I assume you mean: what happens to text that contains such characters? In most SRT systems, such stuff will just be displayed verbatim. Cheers, Silvia.
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Thu, 12 Aug 2010 02:11:55 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Thu, Aug 12, 2010 at 1:26 AM, Philip Jägenstedt phil...@opera.comwrote: On Wed, 11 Aug 2010 15:38:32 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Wed, Aug 11, 2010 at 10:30 PM, Philip Jägenstedt phil...@opera.com wrote: On Wed, 11 Aug 2010 01:43:01 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: Going with HTML in the cues, we either have to drop voices and inner timestamps or invent new markup, as HTML can't express either. I don't think either of those are really good solutions, so right now I'm not convinced that reusing the innerHTML parser is a good way forward. I don't see a need for the voices - they already have markup in HTML, see above. But I do wonder about the timestamps. I'd much rather keep the innerHTML parser if we can, but I don't know enough about how the timestamps could be introduced in a non-breakable manner. Maybe with a data- attribute? Maybe span data-t=00:00:02.100.../span? data- attributes are reserved for use by scripts on the same page, but we *could* of course introduce new elements or attributes for this purpose. However, adding features to HTML only for use in WebSRT seems a bit odd. I'd rather avoid adding features to HTML only for WebSRT. Ian turned the timestamps into ProcessingInstructions http://www.whatwg.org/specs/web-apps/current-work/websrt.html#websrt-cue-text-dom-construction-rules. Could we introduce something like ?t at=00:00:02.100? without breaking the innerHTML parser? It appears that the innerHTML parser in at least Opera and Firefox handles PIs in some manner, see test at http://software.hixie.ch/utilities/js/live-dom-viewer/saved/587 However, it isn't valid HTML, validator.nu says Saw ?. Probable cause: Attempt to use an XML processing instruction in HTML. (XML processing instructions are not supported in HTML.) That would make text/srt and text/websrt synonymous, which is kind of pointless. No, it's only pointless if you are a browser vendor. For everyone else it is a huge advantage to be able to choose between a guaranteed simple format and a complex format with all the bells and whistles. The advantages of taking text/srt is that all existing software to create SRT can be used to create WebSRT That's not strictly true. If they load a WebSRT file that was created by some other software for further editing and that WebSRT file uses advanced WebSRT functionality, the authoring software will break. Right, especially settings appended after the timestamps are quite likely to be stripped when saving the file. Or may even break the software if it's badly implemented, or may end up inside the cue text - just like the other control instructions which will end up as plain text inside the cue. You won't believe how many people have pointed out to me that my SRT test parser exposed an i tag markup in the cue text rather than interpreting it, when I was experimenting with applying SRT cues in a HTML div without touching the cue text content. Extraneous markup is really annoying. Indeed, but given the option of seeing no subtitles at all and seeing some markup from time to time, which do you prefer? For a long time I was using a media player that didn't handle HTML in SRT and wasn't very amused at seeing i and similar, but it was sure better than no subtitles at all. I doubt it will take long for popular software to start ignoring things trailing the timestamp and things in square brackets, which is all you need for basic compatibility. Some of the tested software already does so. Hmm... not sure if I'd prefer to see the crap or rather be forced to run it through a stripping tool first. I think what would happen is that I'd start watching the movie, then notice the crap, get annoyed, stop it, run a stripping tool, restart the movie. I'd probably prefer noticing that before I start the movie, which would happen if the file was a different format. But it does take a bit of expert knowledge to know that websrt can be easily converted to srt and to have such a stripping tool installed, I give you that. Indeed, it never struck me to take the time to strip away the extra markup, even though I would have known how. Instead I waited until my media player could do the job for me. OTOH, if you say that it will take a short time for popular software to start ignoring the extra WebSRT stuff, well, in this case they have implemented WebSRT support in its most basic form and then there is no problem any more anyway. They will then accept the new files and their extensions and mime types and there is explicit support rather than the dodgy question of whether these SRT files will provide crap or not. During a transition period, we will make all software that currently supports SRT become unstable and unreliable. I don't think that's
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, 11 Aug 2010 01:43:01 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: That's a good approach and will reduce the need for breaking backwards-compatibility. In an xml-based format that need is 0, while with a text format where the structure is ad-hoc, that need can never be reduced to 0. That's what I am concerned about and that's why I think we need a version identifier. If we end up never using/changing the version identifier, the better so. But I'd much rather we have it now and can identify what specification a file adheres to than not being able to do so later. XML is also text-based. ;-) But more seriously, if we ever need to make changes that would completely break backwards compatibility we should just use a new format rather than fit it into an existing one. That is the approach we have for most formats (and APIs) on the web (CSS, HTML, XMLHttpRequest) and so far a version identifier need (or need for a replacement) has not yet arisen. Might be worth reading through some of: http://www.w3.org/2002/09/wbs/40318/issues-4-84-objection-poll/results On Tue, Aug 10, 2010 at 7:49 PM, Philip Jägenstedt phil...@opera.comwrote: That would make text/srt and text/websrt synonymous, which is kind of pointless. No, it's only pointless if you are a browser vendor. For everyone else it is a huge advantage to be able to choose between a guaranteed simple format and a complex format with all the bells and whistles. But it is not complex at all and everyone else supports most of the extensions the WebSRT format has. -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, Aug 11, 2010 at 5:04 PM, Anne van Kesteren ann...@opera.com wrote: On Wed, 11 Aug 2010 01:43:01 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: That's a good approach and will reduce the need for breaking backwards-compatibility. In an xml-based format that need is 0, while with a text format where the structure is ad-hoc, that need can never be reduced to 0. That's what I am concerned about and that's why I think we need a version identifier. If we end up never using/changing the version identifier, the better so. But I'd much rather we have it now and can identify what specification a file adheres to than not being able to do so later. XML is also text-based. ;-) I mean unstructured text. ;-) But more seriously, if we ever need to make changes that would completely break backwards compatibility we should just use a new format rather than fit it into an existing one. That's exactly the argument I am using for why WebSRT should be a new format and not take over the SRT space. They are different enough to not just be versions of each other. That's actually what I care about a lot more than a version field. That is the approach we have for most formats (and APIs) on the web (CSS, HTML, XMLHttpRequest) and so far a version identifier need (or need for a replacement) has not yet arisen. There are Web formats with a version attribute, such as Atom, RSS and even HTTP has a version number. Also, I can see that structured formats with a clear path for how extensions would be included may not need such a version attribute. WebSRT is not such a structured format, which is what makes all the difference. For example, you simply cannot put a new element outside the root element in XML, but you can easily put a new element anywhere in WebSRT - which might actually make a lot of sense if you think e.g. about adding SVG and CSS inline in future. Might be worth reading through some of: http://www.w3.org/2002/09/wbs/40318/issues-4-84-objection-poll/results I guess you mostly wanted me to read http://berjon.com/blog/2009/12/xmlbp-naive-versioning.html . :-) It's a nice discussion with some good experiences. Interesting that we need quirks mode to deal with versioning issues. It doesn't take into account good practice in software development, though, where there is a minor version number and a major version number. A change of the minor version number is ignored by apps that need to display something - it just gives a hint that new features were introduced that shouldn't break anything. It's basically metadata to give a hint to applications where it really matters, e.g. if an application relies on new features to be available. A change of major version number, however, essentially means it's a new format and thus breaks existing stuff to allow the world to move forwards within the same namespace and experience framework. But let's get this resolved. I don't care enough about this to make a fuss. So ... if we do everything possible to make WebSRT flexible for future changes (which is what Philip proposed) and agree that if we cannot extend WebSRT in a backwards compatible manner, we will create a new format, I can live without a version attribute. I am only a little weary of this, because already we are trying to make SRT and WebSRT the same format when there is no compatibility (see below). On Tue, Aug 10, 2010 at 7:49 PM, Philip Jägenstedt phil...@opera.com wrote: That would make text/srt and text/websrt synonymous, which is kind of pointless. No, it's only pointless if you are a browser vendor. For everyone else it is a huge advantage to be able to choose between a guaranteed simple format and a complex format with all the bells and whistles. But it is not complex at all and everyone else supports most of the extensions the WebSRT format has. All of the WebSRT extensions that do not exist in {basic SRT , b , i} are not supported by anyone yet. Existing SRT authoring tools, media players, transcoding tools, etc. do not support the cue settings (see http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#websrt-cue-settings), or parsing of random text in the cues, or the voice markers. So, I disagree with everyone else supports most of the extensions of the WebSRT format. Also, what I man with the word complex is actually a good thing: a format that supports lots of requirements that go beyond the basic ones. Thus, it's actually a good thing to have a simple format (i.e. SRT) and a complex (maybe rather: rich? capable?) format (i.e. WebSRT). Cheers, Silvia.
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, 11 Aug 2010 10:30:23 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Wed, Aug 11, 2010 at 5:04 PM, Anne van Kesteren ann...@opera.com wrote: That is the approach we have for most formats (and APIs) on the web (CSS, HTML, XMLHttpRequest) and so far a version identifier need (or need for a replacement) has not yet arisen. There are Web formats with a version attribute, such as Atom, RSS and even HTTP has a version number. None of these have really executed a successful version strategy though. Syndication in particular is quite bad, we should learn from that. See e.g. http://diveintomark.org/archives/2004/02/04/incompatible-rss Also, I can see that structured formats with a clear path for how extensions would be included may not need such a version attribute. WebSRT is not such a structured format, which is what makes all the difference. For example, you simply cannot put a new element outside the root element in XML, but you can easily put a new element anywhere in WebSRT - which might actually make a lot of sense if you think e.g. about adding SVG and CSS inline in future. There is all kinds of ways we could address this. For instance, we could add a feature that makes a line ignored and use that in the future for new features. While players are transitioning to WebSRT they will ensure that they do not break with future versions of the format. There might be enough extensibility in the current WebSRT parsing rules for this, I have not checked. It doesn't take into account good practice in software development, though, where there is a minor version number and a major version number. A change of the minor version number is ignored by apps that need to display something - it just gives a hint that new features were introduced that shouldn't break anything. It's basically metadata to give a hint to applications where it really matters, e.g. if an application relies on new features to be available. A change of major version number, however, essentially means it's a new format and thus breaks existing stuff to allow the world to move forwards within the same namespace and experience framework. What works for software products does not work for formats with universal deployment on which we want to get interoperability between various vendors. They are very distinct. But it is not complex at all and everyone else supports most of the extensions the WebSRT format has. All of the WebSRT extensions that do not exist in {basic SRT , b , i} are not supported by anyone yet. Existing SRT authoring tools, media players, transcoding tools, etc. do not support the cue settings (see http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#websrt-cue-settings), or parsing of random text in the cues, or the voice markers. So, I disagree with everyone else supports most of the extensions of the WebSRT format. Do they throw an error or do they just ignore the settings? If the latter it does not seem like a problem. If the former authors will probably not use these features for a while until they are better supported. Also, what I man with the word complex is actually a good thing: a format that supports lots of requirements that go beyond the basic ones. Thus, it's actually a good thing to have a simple format (i.e. SRT) and a complex (maybe rather: rich? capable?) format (i.e. WebSRT). I don't think so. It just makes things more complex for authors (learn two formats, have to convert formats (i.e. change mime) in order to use new features (which could be as simple as a ruby fragment for some Japanese track), more complex for implementors (need two separate implementations as to not encourage authors to use features of the more complex one in the less complex one), more complex for conformance checkers (need more code), etc. Seems highly suboptimal to me. -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, 11 Aug 2010 13:35:30 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Wed, Aug 11, 2010 at 7:31 PM, Anne van Kesteren ann...@opera.com wrote: While players are transitioning to WebSRT they will ensure that they do not break with future versions of the format. That's impossible, since we do not know what future versions will look like and what features we may need. If that is impossible it would be impossible for HTML and CSS too. And clearly it is not. I'm pretty sure that several will break. We cannot just test a handful of available applications and if they don't break assume none will. In fact, all existing applications that get loaded with a WebSRT file with extended features will display text with stuff that is not expected - in particular if the metadata case is used. And wrong rendering is bad, e.g. if it's part of a production process, burnt onto the video, and shipped to hearing-impaired customers. Or stored in an archive. Sure, that's why the tools should be updated to support the standard format instead rather than each having their own variant of SRT. (And if they really just take in text like that they should at least run some kind of validation so not all kinds of garbage can get in.) I don't think so. It just makes things more complex for authors (learn two formats, I see that as an advantage: I can learn the simple format and be off to a running start immediately. Then, when I find out that I need more features, I can build on top of already existing knowledge for the richer format and can convert my old files through a simple renaming of the resources. Or could you learn the simple format from a tutorial that only teaches that and when you see someone else using more complex features you can just copy and paste them and use them directly. This is pretty much how the web works. have to convert formats (i.e. change mime) in order to use new features (which could be as simple as a ruby fragment for some Japanese track) If I know from the start that I need these features, I will immediately learn WebSRT. But you don't. , more complex for implementors (need two separate implementations as to not encourage authors to use features of the more complex one in the less complex one), more complex for conformance checkers (need more code), etc. Seems highly suboptimal to me. That's already part of Ian's proposal: it already supports multiple different approaches of parsing cues. No extra complexity here. Actually that is not true. There is only one approach to parsing in Ian's proposal. My theory is: we only implement support for WebSRT in the browser - that it happens to also support SRT is a positive side effect. It works for the Web - and it works for the existing SRT communities and platforms. They know they have to move to WebSRT in the long run, but right now they can get away with simple SRT support and still deliver for the Web. And they have a growth path into a new file format that provides richer features. This is the proposal. That they are the same format should not matter. -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, 11 Aug 2010 01:43:01 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Tue, Aug 10, 2010 at 7:49 PM, Philip Jägenstedt phil...@opera.comwrote: On Tue, 10 Aug 2010 01:34:02 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Tue, Aug 10, 2010 at 12:04 AM, Philip Jägenstedt phil...@opera.com wrote: On Sat, 07 Aug 2010 09:57:39 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: I guess this is in support of Henri's proposal of parsing the cue using the HTML fragment parser (same as innerHTML)? That would be easy to implement, but how do we then mark up speakers? Using span class=narrator/span around each cue is very verbose. HTML isn't very good for marking up dialog, which is quite a limitation when dealing with subtitles... I actually think that the span @class mechanism is much more flexible than what we have in WebSRT right now. If we want multiple speakers to be able to speak in the same subtitle, then that's not possible in WebSRT. It's a little more verbose in HTML, but not massively. We might be able to add a special markup similar to the [timestamp] markup that Hixie introduced for Karaoke. This is beyond the innerHTML parser and I am not sure if it breaks it. But if it doesn't, then maybe we can also introduce a [voice] marker to be used similarly? An HTML parser parsing 1 or 00:01:30 will produce text nodes 1 and 00:01:30. Without having read the HTML parsing algorithm I guess that elements need to begin with a letter or similar. So, it's not possible to (ab)use the HTML parser to handle inner timestamps of numerical voices, we'd have to replace those with something else, probably more verbose. I have checked the parse spec and http://www.whatwg.org/specs/web-apps/current-work/#tag-open-state indeed implies that a tag starting with a number is a parse error. Both, the timestamps and the voice markers thus seem problems when going with an innerHTML parser. Is there a way to resolve this? I mean: I'd quite happily drop the voice markers for a span @class but I am not sure what to do about the timestamps. We could do what I did in WMML and introduce a t element with the timestamp as a @at attribute, but that is again more verbose. We could also introduce an @at attribute in span which would then at least end up in the DOM and can be dealt with specially. What should numerical voices be replaced with? Personally I'd much rather write philip and silvia to mark up a conversation between us two, as I think it'd be quite hard to keep track of the numbers if editing subtitles with many different speakers. However, going with that and using an HTML parser is quite a hack. Names like mark and li may already have special parsing rules or default CSS. Going with HTML in the cues, we either have to drop voices and inner timestamps or invent new markup, as HTML can't express either. I don't think either of those are really good solutions, so right now I'm not convinced that reusing the innerHTML parser is a good way forward. Think for example about the case where we had a requirement that a double newline starts a new cue, but now we want to introduce a means where the double newline is escaped and can be made part of a cue. Other formats keep track of their version, such as MS Word files. It is to be hoped that most new features can be introduced without breaking backwards compatibility and we can write the parsing requirements such that certain things will be ignored, but in and of itself, WebSRT doesn't provide for this extensibility. Right now, there is for example extensibility with the WebSRT settings parsing (that's the stuff behind the timestamps) where further setting:value settings can be introduced. But for example the introduction of new cue identifiers (that's the marker at the start of a cue) would be difficult without a version string, since anything that doesn't match the given list will just be parsed as cue-internal tag and thus end up as part of the cue text where plain text parsing is used. The bug I filed suggested allowing arbitrary voices, to simplify the parser and to make future extensions possible. For a web format I think this is a better approach format than versioning. I haven't done a full review of the parser, but there are probably more places where it could be more forgiving so as to allow future tweaking. That's a good approach and will reduce the need for breaking backwards-compatibility. In an xml-based format that need is 0, while with a text format where the structure is ad-hoc, that need can never be reduced to 0. That's what I am concerned about and that's why I think we need a version identifier. If we end up never using/changing the version identifier, the better so. But I'd much rather we have it now and can identify what specification a file adheres to than not being able to do so later. Perhaps I'm too
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, Aug 11, 2010 at 9:49 PM, Anne van Kesteren ann...@opera.com wrote: On Wed, 11 Aug 2010 13:35:30 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Wed, Aug 11, 2010 at 7:31 PM, Anne van Kesteren ann...@opera.com wrote: While players are transitioning to WebSRT they will ensure that they do not break with future versions of the format. That's impossible, since we do not know what future versions will look like and what features we may need. If that is impossible it would be impossible for HTML and CSS too. And clearly it is not. HTML and CSS have predefined structures within which their languages grow and are able to grow. WebSRT has newlines to structure the format, which is clearly not very useful for extensibility. No matter how we turn this, the xml background or HTML and the name-value background of CSS provide them with in-built extensibility, which WebSRT does not have. I'm pretty sure that several will break. We cannot just test a handful of available applications and if they don't break assume none will. In fact, all existing applications that get loaded with a WebSRT file with extended features will display text with stuff that is not expected - in particular if the metadata case is used. And wrong rendering is bad, e.g. if it's part of a production process, burnt onto the video, and shipped to hearing-impaired customers. Or stored in an archive. Sure, that's why the tools should be updated to support the standard format instead rather than each having their own variant of SRT. They don't have their own variant of SRT - they only have their own parsers. Some will tolerate crap at the end of the -- line. Others won't. That's no break of conformance to the basic spec as given in http://en.wikipedia.org/wiki/SubRip#SubRip_text_file_format . They all interoperate on the basic SRT format. But they don't interoperate on the WebSRT format. That's why WebSRT has to be a new format. (And if they really just take in text like that they should at least run some kind of validation so not all kinds of garbage can get in.) That's not a requirement of the spec. It's requirement is to render whatever characters are given in cues. That's why it is so simple. I don't think so. It just makes things more complex for authors (learn two formats, I see that as an advantage: I can learn the simple format and be off to a running start immediately. Then, when I find out that I need more features, I can build on top of already existing knowledge for the richer format and can convert my old files through a simple renaming of the resources. Or could you learn the simple format from a tutorial that only teaches that and when you see someone else using more complex features you can just copy and paste them and use them directly. This is pretty much how the web works. Sure. All I need to do is rename the file. Not much trouble at all. Better than believing I can just copy stuff from others since it's apparently the same format and then it breaks the SRT environment that I already have and that works. have to convert formats (i.e. change mime) in order to use new features (which could be as simple as a ruby fragment for some Japanese track) If I know from the start that I need these features, I will immediately learn WebSRT. But you don't. Why? If I write Japanese subtitles and my tutorial tells me they are not supported in SRT, but only in WebSRT, then I go for WebSRT. Done. , more complex for implementors (need two separate implementations as to not encourage authors to use features of the more complex one in the less complex one), more complex for conformance checkers (need more code), etc. Seems highly suboptimal to me. That's already part of Ian's proposal: it already supports multiple different approaches of parsing cues. No extra complexity here. Actually that is not true. There is only one approach to parsing in Ian's proposal. A the moment, cues can have one of two different types of content: (see http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#syntax-0 6. The cue payload: either WebSRT cue texthttp://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#websrt-cue-text or WebSRT metadata texthttp://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#websrt-metadata-text . So that means in essence two different parsers. My theory is: we only implement support for WebSRT in the browser - that it happens to also support SRT is a positive side effect. It works for the Web - and it works for the existing SRT communities and platforms. They know they have to move to WebSRT in the long run, but right now they can get away with simple SRT support and still deliver for the Web. And they have a growth path into a new file format that provides richer features. This is the proposal. That they are the same format should not matter. It matters to other
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, Aug 11, 2010 at 10:30 PM, Philip Jägenstedt phil...@opera.comwrote: On Wed, 11 Aug 2010 01:43:01 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Tue, Aug 10, 2010 at 7:49 PM, Philip Jägenstedt phil...@opera.com wrote: I have checked the parse spec and http://www.whatwg.org/specs/web-apps/current-work/#tag-open-state indeed implies that a tag starting with a number is a parse error. Both, the timestamps and the voice markers thus seem problems when going with an innerHTML parser. Is there a way to resolve this? I mean: I'd quite happily drop the voice markers for a span @class but I am not sure what to do about the timestamps. We could do what I did in WMML and introduce a t element with the timestamp as a @at attribute, but that is again more verbose. We could also introduce an @at attribute in span which would then at least end up in the DOM and can be dealt with specially. What should numerical voices be replaced with? Personally I'd much rather write philip and silvia to mark up a conversation between us two, as I think it'd be quite hard to keep track of the numbers if editing subtitles with many different speakers. However, going with that and using an HTML parser is quite a hack. Names like mark and li may already have special parsing rules or default CSS. In HTML it is span class=philip../span and span class=silvia.../span. I don't see anything wrong with that. And it's only marginally longer than philip ... /philip and silvia.../silvia. Going with HTML in the cues, we either have to drop voices and inner timestamps or invent new markup, as HTML can't express either. I don't think either of those are really good solutions, so right now I'm not convinced that reusing the innerHTML parser is a good way forward. I don't see a need for the voices - they already have markup in HTML, see above. But I do wonder about the timestamps. I'd much rather keep the innerHTML parser if we can, but I don't know enough about how the timestamps could be introduced in a non-breakable manner. Maybe with a data- attribute? Maybe span data-t=00:00:02.100.../span? Think for example about the case where we had a requirement that a double newline starts a new cue, but now we want to introduce a means where the double newline is escaped and can be made part of a cue. Other formats keep track of their version, such as MS Word files. It is to be hoped that most new features can be introduced without breaking backwards compatibility and we can write the parsing requirements such that certain things will be ignored, but in and of itself, WebSRT doesn't provide for this extensibility. Right now, there is for example extensibility with the WebSRT settings parsing (that's the stuff behind the timestamps) where further setting:value settings can be introduced. But for example the introduction of new cue identifiers (that's the marker at the start of a cue) would be difficult without a version string, since anything that doesn't match the given list will just be parsed as cue-internal tag and thus end up as part of the cue text where plain text parsing is used. The bug I filed suggested allowing arbitrary voices, to simplify the parser and to make future extensions possible. For a web format I think this is a better approach format than versioning. I haven't done a full review of the parser, but there are probably more places where it could be more forgiving so as to allow future tweaking. That's a good approach and will reduce the need for breaking backwards-compatibility. In an xml-based format that need is 0, while with a text format where the structure is ad-hoc, that need can never be reduced to 0. That's what I am concerned about and that's why I think we need a version identifier. If we end up never using/changing the version identifier, the better so. But I'd much rather we have it now and can identify what specification a file adheres to than not being able to do so later. Perhaps I'm too influenced by HTML and its failed attempts at versioning, but I think that if you want to know which version of a spec a document is written against, you can run it through a parser for each version. This doesn't tell you the author intent, but I'm not sure that's very interesting to know. If the author thinks it's important, perhaps it can be put in a comment in the header. I was most concerned about non-backwards-compatible changes here, but let's not repeat the discussion I had with Anne. Let's rather focus on making sure we have some means of extending WebSRT in future, should the need arise. On the other hand, keeping the same extension and (unregistered) MIME type as SRT has plenty of benefits, such as immediately being able to use existing SRT files in browsers without changing their file extension or MIME type. There is no harm for browsers to accept both MIME types if they are sure they can parse
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, 11 Aug 2010 15:09:34 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: HTML and CSS have predefined structures within which their languages grow and are able to grow. WebSRT has newlines to structure the format, which is clearly not very useful for extensibility. No matter how we turn this, the xml background or HTML and the name-value background of CSS provide them with in-built extensibility, which WebSRT does not have. The parser has the bad cue loop concept for ignoring supposedly bogus lines. Seems extensible to me. Sure, that's why the tools should be updated to support the standard format instead rather than each having their own variant of SRT. They don't have their own variant of SRT - they only have their own parsers. That comes down to the same thing in my opinion. This is like saying browsers did not all have their own variant of HTML4. Some will tolerate crap at the end of the -- line. Others won't. That's no break of conformance to the basic spec as given in http://en.wikipedia.org/wiki/SubRip#SubRip_text_file_format . They all interoperate on the basic SRT format. But they don't interoperate on the WebSRT format. That's why WebSRT has to be a new format. By that reasoning HTML5 would have had to be a new format too. And CSS 2.1 as opposed to CSS 2, etc. (And if they really just take in text like that they should at least run some kind of validation so not all kinds of garbage can get in.) That's not a requirement of the spec. It's requirement is to render whatever characters are given in cues. That's why it is so simple. But it is not so simple because various extensions are out there in the wild and are used so the concerns you have with respect to WebSRT already apply. Sure. All I need to do is rename the file. Not much trouble at all. Better than believing I can just copy stuff from others since it's apparently the same format and then it breaks the SRT environment that I already have and that works. At least with the copy approach you would still see something in your SRT environment. The ruby bits would just be ignored or some such. That's already part of Ian's proposal: it already supports multiple different approaches of parsing cues. No extra complexity here. Actually that is not true. There is only one approach to parsing in Ian's proposal. A the moment, cues can have one of two different types of content: (see http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#syntax-0 [...] So that means in essence two different parsers. Per the parser section there is only one. See the end of http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#parsing-0 -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, Aug 11, 2010 at 11:45 PM, Anne van Kesteren ann...@opera.comwrote: On Wed, 11 Aug 2010 15:09:34 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: HTML and CSS have predefined structures within which their languages grow and are able to grow. WebSRT has newlines to structure the format, which is clearly not very useful for extensibility. No matter how we turn this, the xml background or HTML and the name-value background of CSS provide them with in-built extensibility, which WebSRT does not have. The parser has the bad cue loop concept for ignoring supposedly bogus lines. Seems extensible to me. Hmm, that's for ignoring lines that don't match the -- pattern. It could work: ignore anything that's inside a WebSRT file and not a cue. I tend to think of caption files as composed of the following broad components: * header-data that is information that applies to the complete file, which tends to be setup data (such as language, charset, stylesheet link etc) and metadata (name-value pairs) * a list of cues, which have their own structure: ** start and end time ** per-cue header-type data such as more setup data, positioning, text size etc ** the cue text itself (in various structured formats, potentially with time markers for roll-on presentation) * comments that can be made at any location As long as we can make sure we're extensible within these broader areas, I *think* we should be ok. Sure, that's why the tools should be updated to support the standard format instead rather than each having their own variant of SRT. They don't have their own variant of SRT - they only have their own parsers. That comes down to the same thing in my opinion. This is like saying browsers did not all have their own variant of HTML4. From an author's point of view, they were not writing multiple different Web pages, but only trying to accommodate the quirks of each browser in one page. So, no, I wouldn't regard them as having different versions of HTML4. Some will tolerate crap at the end of the -- line. Others won't. That's no break of conformance to the basic spec as given in http://en.wikipedia.org/wiki/SubRip#SubRip_text_file_format . They all interoperate on the basic SRT format. But they don't interoperate on the WebSRT format. That's why WebSRT has to be a new format. By that reasoning HTML5 would have had to be a new format too. And CSS 2.1 as opposed to CSS 2, etc. They interoperate by their sheer structure. It has been made sure that old browsers will ignore the new additions because there is a structured means to grow theres. So, no, I believe they are different cases. (And if they really just take in text like that they should at least run some kind of validation so not all kinds of garbage can get in.) That's not a requirement of the spec. It's requirement is to render whatever characters are given in cues. That's why it is so simple. But it is not so simple because various extensions are out there in the wild and are used so the concerns you have with respect to WebSRT already apply. There are two version out there: the plain ones without markup and the ones with i,b,u and font. Nothing else exists. Those could be called quirks of the same format. I would prefer if SRT meant only the stuff without any markup at all, which is supported by everyone who supports SRT. The thing is, WebSRT isn't even backwards compatible with the quirky SRT extension: it doesn't support u and font. So, it's neither backwards nor forwards compatible. Sure. All I need to do is rename the file. Not much trouble at all. Better than believing I can just copy stuff from others since it's apparently the same format and then it breaks the SRT environment that I already have and that works. At least with the copy approach you would still see something in your SRT environment. The ruby bits would just be ignored or some such. Preferably, I would be using a captioning application which will make me aware that I am just now adding features that the format the I used for saving doesn't support. So it gives me the choice of either losing those features or upgrading to the better format. It's what all text processors do, too, so people are used to it. And they know to stick to the more capable formats. That's already part of Ian's proposal: it already supports multiple different approaches of parsing cues. No extra complexity here. Actually that is not true. There is only one approach to parsing in Ian's proposal. A the moment, cues can have one of two different types of content: (see http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#syntax-0 [...] So that means in essence two different parsers. Per the parser section there is only one. See the end of http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#parsing-0 Yeah, I think there's something missing in the spec. Cheers, Silvia.
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, 11 Aug 2010 15:38:32 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Wed, Aug 11, 2010 at 10:30 PM, Philip Jägenstedt phil...@opera.comwrote: On Wed, 11 Aug 2010 01:43:01 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Tue, Aug 10, 2010 at 7:49 PM, Philip Jägenstedt phil...@opera.com wrote: I have checked the parse spec and http://www.whatwg.org/specs/web-apps/current-work/#tag-open-state indeed implies that a tag starting with a number is a parse error. Both, the timestamps and the voice markers thus seem problems when going with an innerHTML parser. Is there a way to resolve this? I mean: I'd quite happily drop the voice markers for a span @class but I am not sure what to do about the timestamps. We could do what I did in WMML and introduce a t element with the timestamp as a @at attribute, but that is again more verbose. We could also introduce an @at attribute in span which would then at least end up in the DOM and can be dealt with specially. What should numerical voices be replaced with? Personally I'd much rather write philip and silvia to mark up a conversation between us two, as I think it'd be quite hard to keep track of the numbers if editing subtitles with many different speakers. However, going with that and using an HTML parser is quite a hack. Names like mark and li may already have special parsing rules or default CSS. In HTML it is span class=philip../span and span class=silvia.../span. I don't see anything wrong with that. And it's only marginally longer than philip ... /philip and silvia.../silvia. Going with HTML in the cues, we either have to drop voices and inner timestamps or invent new markup, as HTML can't express either. I don't think either of those are really good solutions, so right now I'm not convinced that reusing the innerHTML parser is a good way forward. I don't see a need for the voices - they already have markup in HTML, see above. But I do wonder about the timestamps. I'd much rather keep the innerHTML parser if we can, but I don't know enough about how the timestamps could be introduced in a non-breakable manner. Maybe with a data- attribute? Maybe span data-t=00:00:02.100.../span? data- attributes are reserved for use by scripts on the same page, but we *could* of course introduce new elements or attributes for this purpose. However, adding features to HTML only for use in WebSRT seems a bit odd. That would make text/srt and text/websrt synonymous, which is kind of pointless. No, it's only pointless if you are a browser vendor. For everyone else it is a huge advantage to be able to choose between a guaranteed simple format and a complex format with all the bells and whistles. The advantages of taking text/srt is that all existing software to create SRT can be used to create WebSRT That's not strictly true. If they load a WebSRT file that was created by some other software for further editing and that WebSRT file uses advanced WebSRT functionality, the authoring software will break. Right, especially settings appended after the timestamps are quite likely to be stripped when saving the file. Or may even break the software if it's badly implemented, or may end up inside the cue text - just like the other control instructions which will end up as plain text inside the cue. You won't believe how many people have pointed out to me that my SRT test parser exposed an i tag markup in the cue text rather than interpreting it, when I was experimenting with applying SRT cues in a HTML div without touching the cue text content. Extraneous markup is really annoying. Indeed, but given the option of seeing no subtitles at all and seeing some markup from time to time, which do you prefer? For a long time I was using a media player that didn't handle HTML in SRT and wasn't very amused at seeing i and similar, but it was sure better than no subtitles at all. I doubt it will take long for popular software to start ignoring things trailing the timestamp and things in square brackets, which is all you need for basic compatibility. Some of the tested software already does so. and servers that already send text/srt don't need to be updated. In either case I think we should support only one mime type. What's the harm in supporting two mime types but using the same parser to parse them? Most content will most likely be plain old SRT without voices, ruby or similar. People will create them using existing software with the .srt extension and serve them using the text/srt MIME type. When they later decide to add some ruby or similar, it will just work without changing the extension or MIME type. The net result is that text/srt and text/websrt mean exactly the same thing, making it a wasted effort. From a Web browser perspective, yes. But not from a caption authoring perspective. At first, I would author a SRT file.
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Thu, Aug 12, 2010 at 1:26 AM, Philip Jägenstedt phil...@opera.comwrote: On Wed, 11 Aug 2010 15:38:32 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Wed, Aug 11, 2010 at 10:30 PM, Philip Jägenstedt phil...@opera.com wrote: On Wed, 11 Aug 2010 01:43:01 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: Going with HTML in the cues, we either have to drop voices and inner timestamps or invent new markup, as HTML can't express either. I don't think either of those are really good solutions, so right now I'm not convinced that reusing the innerHTML parser is a good way forward. I don't see a need for the voices - they already have markup in HTML, see above. But I do wonder about the timestamps. I'd much rather keep the innerHTML parser if we can, but I don't know enough about how the timestamps could be introduced in a non-breakable manner. Maybe with a data- attribute? Maybe span data-t=00:00:02.100.../span? data- attributes are reserved for use by scripts on the same page, but we *could* of course introduce new elements or attributes for this purpose. However, adding features to HTML only for use in WebSRT seems a bit odd. I'd rather avoid adding features to HTML only for WebSRT. Ian turned the timestamps into ProcessingInstructions http://www.whatwg.org/specs/web-apps/current-work/websrt.html#websrt-cue-text-dom-construction-rules. Could we introduce something like ?t at=00:00:02.100? without breaking the innerHTML parser? That would make text/srt and text/websrt synonymous, which is kind of pointless. No, it's only pointless if you are a browser vendor. For everyone else it is a huge advantage to be able to choose between a guaranteed simple format and a complex format with all the bells and whistles. The advantages of taking text/srt is that all existing software to create SRT can be used to create WebSRT That's not strictly true. If they load a WebSRT file that was created by some other software for further editing and that WebSRT file uses advanced WebSRT functionality, the authoring software will break. Right, especially settings appended after the timestamps are quite likely to be stripped when saving the file. Or may even break the software if it's badly implemented, or may end up inside the cue text - just like the other control instructions which will end up as plain text inside the cue. You won't believe how many people have pointed out to me that my SRT test parser exposed an i tag markup in the cue text rather than interpreting it, when I was experimenting with applying SRT cues in a HTML div without touching the cue text content. Extraneous markup is really annoying. Indeed, but given the option of seeing no subtitles at all and seeing some markup from time to time, which do you prefer? For a long time I was using a media player that didn't handle HTML in SRT and wasn't very amused at seeing i and similar, but it was sure better than no subtitles at all. I doubt it will take long for popular software to start ignoring things trailing the timestamp and things in square brackets, which is all you need for basic compatibility. Some of the tested software already does so. Hmm... not sure if I'd prefer to see the crap or rather be forced to run it through a stripping tool first. I think what would happen is that I'd start watching the movie, then notice the crap, get annoyed, stop it, run a stripping tool, restart the movie. I'd probably prefer noticing that before I start the movie, which would happen if the file was a different format. But it does take a bit of expert knowledge to know that websrt can be easily converted to srt and to have such a stripping tool installed, I give you that. OTOH, if you say that it will take a short time for popular software to start ignoring the extra WebSRT stuff, well, in this case they have implemented WebSRT support in its most basic form and then there is no problem any more anyway. They will then accept the new files and their extensions and mime types and there is explicit support rather than the dodgy question of whether these SRT files will provide crap or not. During a transition period, we will make all software that currently supports SRT become unstable and unreliable. I don't think that's the right way to deal with an existing ecosystem. Coming in as the big brother, claiming their underspecified format, throwing in incompatible features, and saying: just deal with it. It's just not the cavalier thing to do. and servers that already send text/srt don't need to be updated. In either case I think we should support only one mime type. What's the harm in supporting two mime types but using the same parser to parse them? Most content will most likely be plain old SRT without voices, ruby or similar. People will create them using existing software with the .srt extension and serve them using the text/srt MIME type. When
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Tue, 10 Aug 2010 01:34:02 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Tue, Aug 10, 2010 at 12:04 AM, Philip Jägenstedt phil...@opera.comwrote: On Sat, 07 Aug 2010 09:57:39 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: Hi Philip, On Sat, Aug 7, 2010 at 1:50 AM, Philip Jägenstedt phil...@opera.com wrote: I'm not sure of the best solution. I'd quite like the ability to use arbitrary voices, e.g. to use the names/initials of the speaker rather than a number, or to use e.g. shouting in combination with CSS :before { content 'Shouting: ' } or similar to adapt the display for different audiences (accessibility, basically). I agree. I think we can go back to usingspan and @class and @id and that would solve it all. I guess this is in support of Henri's proposal of parsing the cue using the HTML fragment parser (same as innerHTML)? That would be easy to implement, but how do we then mark up speakers? Using span class=narrator/span around each cue is very verbose. HTML isn't very good for marking up dialog, which is quite a limitation when dealing with subtitles... I actually think that the span @class mechanism is much more flexible than what we have in WebSRT right now. If we want multiple speakers to be able to speak in the same subtitle, then that's not possible in WebSRT. It's a little more verbose in HTML, but not massively. We might be able to add a special markup similar to the [timestamp] markup that Hixie introduced for Karaoke. This is beyond the innerHTML parser and I am not sure if it breaks it. But if it doesn't, then maybe we can also introduce a [voice] marker to be used similarly? An HTML parser parsing 1 or 00:01:30 will produce text nodes 1 and 00:01:30. Without having read the HTML parsing algorithm I guess that elements need to begin with a letter or similar. So, it's not possible to (ab)use the HTML parser to handle inner timestamps of numerical voices, we'd have to replace those with something else, probably more verbose. * there is no version number on the format, thus it will be difficult to introduce future changes. I think we shouldn't have a version number, for the same reason that CSS and HTML don't really have versions. If we evolve the WebSRT spec, it should be in a backwards-compatible way. CSS and HTML are structured formats where you ignore things that you cannot interpret. But the parsing is fixed and extensions play within this parsing framework. I have my doubts that is possible with WebSRT. Already one extension that we are discussion here will break parsing: the introduction of structured headers. Because there is no structured way of extending WebSRT, I believe the best way to communicate whether it is backwards compatible is through a version number. We can change the minor versions if the compatibility is not broken - it communicates though what features are being used - and we can change the major version of compatibility is broken. Similarly, I think that the WebSRT parser should be designed to ignore things that it doesn't recognize, in particular unknown voices (if we keep those). Requiring parsers to fail when the version number is increased oh, you misunderstood me: I am not saying that parser have to fail - it's good if they don't. But I am saying that if we make a change to the specification that is not backwards compatible with the previous one and will thus invariably break parsers, we have to notify parsers somehow such that if they get parse errors they can e.g. notify the user that this is a new version of the WebSRT format which their software doesn't support yet. A browser won't bother their users by saying hey, there was something in this page I didn't understand, as users won't know what to do to fix it. Think for example about the case where we had a requirement that a double newline starts a new cue, but now we want to introduce a means where the double newline is escaped and can be made part of a cue. Other formats keep track of their version, such as MS Word files. It is to be hoped that most new features can be introduced without breaking backwards compatibility and we can write the parsing requirements such that certain things will be ignored, but in and of itself, WebSRT doesn't provide for this extensibility. Right now, there is for example extensibility with the WebSRT settings parsing (that's the stuff behind the timestamps) where further setting:value settings can be introduced. But for example the introduction of new cue identifiers (that's the marker at the start of a cue) would be difficult without a version string, since anything that doesn't match the given list will just be parsed as cue-internal tag and thus end up as part of the cue text where plain text parsing is used. The bug I filed suggested allowing arbitrary voices, to simplify the parser and to make future extensions possible.
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Tue, Aug 10, 2010 at 7:49 PM, Philip Jägenstedt phil...@opera.comwrote: On Tue, 10 Aug 2010 01:34:02 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Tue, Aug 10, 2010 at 12:04 AM, Philip Jägenstedt phil...@opera.com wrote: On Sat, 07 Aug 2010 09:57:39 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: I guess this is in support of Henri's proposal of parsing the cue using the HTML fragment parser (same as innerHTML)? That would be easy to implement, but how do we then mark up speakers? Using span class=narrator/span around each cue is very verbose. HTML isn't very good for marking up dialog, which is quite a limitation when dealing with subtitles... I actually think that the span @class mechanism is much more flexible than what we have in WebSRT right now. If we want multiple speakers to be able to speak in the same subtitle, then that's not possible in WebSRT. It's a little more verbose in HTML, but not massively. We might be able to add a special markup similar to the [timestamp] markup that Hixie introduced for Karaoke. This is beyond the innerHTML parser and I am not sure if it breaks it. But if it doesn't, then maybe we can also introduce a [voice] marker to be used similarly? An HTML parser parsing 1 or 00:01:30 will produce text nodes 1 and 00:01:30. Without having read the HTML parsing algorithm I guess that elements need to begin with a letter or similar. So, it's not possible to (ab)use the HTML parser to handle inner timestamps of numerical voices, we'd have to replace those with something else, probably more verbose. I have checked the parse spec and http://www.whatwg.org/specs/web-apps/current-work/#tag-open-state indeed implies that a tag starting with a number is a parse error. Both, the timestamps and the voice markers thus seem problems when going with an innerHTML parser. Is there a way to resolve this? I mean: I'd quite happily drop the voice markers for a span @class but I am not sure what to do about the timestamps. We could do what I did in WMML and introduce a t element with the timestamp as a @at attribute, but that is again more verbose. We could also introduce an @at attribute in span which would then at least end up in the DOM and can be dealt with specially. Just for those who think it's a fancy karaoke feature and isn't really required: it's actually also a useful feature for captions, in particular when recording live captions that are usually paint-on. Requirement CC-14 on http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements also refers to this need and 608/708 captions provide this functionality, too. Similarly, I think that the WebSRT parser should be designed to ignore things that it doesn't recognize, in particular unknown voices (if we keep those). Requiring parsers to fail when the version number is increased oh, you misunderstood me: I am not saying that parser have to fail - it's good if they don't. But I am saying that if we make a change to the specification that is not backwards compatible with the previous one and will thus invariably break parsers, we have to notify parsers somehow such that if they get parse errors they can e.g. notify the user that this is a new version of the WebSRT format which their software doesn't support yet. A browser won't bother their users by saying hey, there was something in this page I didn't understand, as users won't know what to do to fix it. I'm not overly worried about browsers. They will just display the wrong text. They are not normally an authoring or transcoding application. I am more worried about non-browser applications here, in particular those where interpreting the text the wrong way will lead to disaster, such as the wrong data in an archive etc. Think for example about the case where we had a requirement that a double newline starts a new cue, but now we want to introduce a means where the double newline is escaped and can be made part of a cue. Other formats keep track of their version, such as MS Word files. It is to be hoped that most new features can be introduced without breaking backwards compatibility and we can write the parsing requirements such that certain things will be ignored, but in and of itself, WebSRT doesn't provide for this extensibility. Right now, there is for example extensibility with the WebSRT settings parsing (that's the stuff behind the timestamps) where further setting:value settings can be introduced. But for example the introduction of new cue identifiers (that's the marker at the start of a cue) would be difficult without a version string, since anything that doesn't match the given list will just be parsed as cue-internal tag and thus end up as part of the cue text where plain text parsing is used. The bug I filed suggested allowing arbitrary voices, to simplify the parser and to make future extensions possible. For a web format I think this is a
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Sat, 07 Aug 2010 09:57:39 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: Hi Philip, On Sat, Aug 7, 2010 at 1:50 AM, Philip Jägenstedt phil...@opera.com wrote: * there is a possibility to provide script that just affects the time-synchronized text resource I agree that some metadata would be useful, more on that below. I'm not sure why we would want to run scripts inside the text document, though, when that can be accomplished by using the TimedTrack API from the containing page. Scripts inside a timed text document would only be useful for applications that use the track not in conjunction with a Web page. Do you mean that media players could include a JavaScript engine just for supporting scripts in WebSRT? Not to say that it can't happen, but it seems a bit unlikely. 2. There is a natural mapping of WebSRT into in-band text tracks. Each cue naturally maps into a encoding page (just like a WMML cue does, too). But in WebSRT, because the setup information is not brought in a hierarchical element surrounding all cues, it is easier to just chuck anything that comes before the first cue into an encoding header page. For WMML, this problem can be solved, but it is less natural. I really like the idea of letting everything before the first timestamp in WebSRT be interpreted as the header. I'd want to use it like this: # author: Fan Subber # voices: 1 Boy # 2 Girl 01:23:45.678 -- 01:23:46.789 1 Hello 01:23:48.910 -- 01:23:49.101 2 Hello It's not critical that the format of the header be machine-readable, but we could of course make up a key-value syntax, use JSON, or something else. I disagree. I think it's absolutely necessary that the format of the header be machine-readable. Just like EXIF in images is machine readable or ID3 in MP3 is machine-readable. It would be counter-productive not to have it machine-readable, in particular useless to archiving and media management solutions. OK, so maybe key-values? Author: Fan Subber Voice: 1 Boy Voice: 2 Girl 01:23:45.678 -- 01:23:46.789 1 Hello This looks a bit like HTTP headers. (I'm not sure I'd actually want to allow multiple occurrences of the same key, in practice that seems to result in inconsistencies in how people mark up multiple authors.) I'm not sure of the best solution. I'd quite like the ability to use arbitrary voices, e.g. to use the names/initials of the speaker rather than a number, or to use e.g. shouting in combination with CSS :before { content 'Shouting: ' } or similar to adapt the display for different audiences (accessibility, basically). I agree. I think we can go back to usingspan and @class and @id and that would solve it all. I guess this is in support of Henri's proposal of parsing the cue using the HTML fragment parser (same as innerHTML)? That would be easy to implement, but how do we then mark up speakers? Using span class=narrator/span around each cue is very verbose. HTML isn't very good for marking up dialog, which is quite a limitation when dealing with subtitles... * there is no language specification for a WebSRT resource; while this will not be a problem when used in conjunction with a track element, it still is a problem when the resource is used just by itself, in particular as a hint for font selection and speech synthesis. The language inside the WebSRT file wouldn't end up being used for anything by a browser, as it needs to know the language before downloading it to know whether or not to download it at all. Still, I'd like a header section in WebSRT. I think the parser is already defined so that it would ignore garbage before the first cue, so this is more a matter of making it legal syntax. Not quite. Some metadata in the header can make sense to also expose to the Web page. I agree that we need a structured header section in WebSRT. Fair enough, we should revisit this when deciding on how to expose metadata in media resources in general. * there is no means to identify which parser is required in the cues (is it plain text, minimal markup, or anything?) and therefore it is not possible for an application to know how it should parse the cues. All the types that are actually for visual rendering are parsed in the same way, aren't they? Of course there's no way for non-browsers to know that metadata tracks aren't interesting to look at as subtitles, but I think showing the user the garbage is a quicker to communicate that the file isn't for direct viewing than hiding the text or similar. The spec says that files of kind descriptions and metadata are not displayed. It seems though that the parsing section will try two interfaces: HTML and plain. I think there is a disconnect there. If we already know that it's not parsable in HTML, why even try? I was confused. The parsing algorithm does the same thing regardless of what kind of text track it is
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Tue, Aug 10, 2010 at 12:04 AM, Philip Jägenstedt phil...@opera.comwrote: On Sat, 07 Aug 2010 09:57:39 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: Hi Philip, On Sat, Aug 7, 2010 at 1:50 AM, Philip Jägenstedt phil...@opera.com wrote: * there is a possibility to provide script that just affects the time-synchronized text resource I agree that some metadata would be useful, more on that below. I'm not sure why we would want to run scripts inside the text document, though, when that can be accomplished by using the TimedTrack API from the containing page. Scripts inside a timed text document would only be useful for applications that use the track not in conjunction with a Web page. Do you mean that media players could include a JavaScript engine just for supporting scripts in WebSRT? Not to say that it can't happen, but it seems a bit unlikely. Yes, it's indeed an out there feature and I am not worried about having it now. I just mentioned it as a simple possibility for extension. 2. There is a natural mapping of WebSRT into in-band text tracks. Each cue naturally maps into a encoding page (just like a WMML cue does, too). But in WebSRT, because the setup information is not brought in a hierarchical element surrounding all cues, it is easier to just chuck anything that comes before the first cue into an encoding header page. For WMML, this problem can be solved, but it is less natural. I really like the idea of letting everything before the first timestamp in WebSRT be interpreted as the header. I'd want to use it like this: # author: Fan Subber # voices: 1 Boy # 2 Girl 01:23:45.678 -- 01:23:46.789 1 Hello 01:23:48.910 -- 01:23:49.101 2 Hello It's not critical that the format of the header be machine-readable, but we could of course make up a key-value syntax, use JSON, or something else. I disagree. I think it's absolutely necessary that the format of the header be machine-readable. Just like EXIF in images is machine readable or ID3 in MP3 is machine-readable. It would be counter-productive not to have it machine-readable, in particular useless to archiving and media management solutions. OK, so maybe key-values? Author: Fan Subber Voice: 1 Boy Voice: 2 Girl 01:23:45.678 -- 01:23:46.789 1 Hello This looks a bit like HTTP headers. (I'm not sure I'd actually want to allow multiple occurrences of the same key, in practice that seems to result in inconsistencies in how people mark up multiple authors.) Yes, anything that can replicate the name-value possibilities of the meta element should be fine. Multiple occurrences make sense for some fields and not for others. I wonder if we would need to make a defined list of what should go in here or just define a general mechanism. HTML has a general mechanism (with meta) while most subtitle formats have a defined set of fileds, e.g. http://en.wikipedia.org/wiki/LRC_%28file_format%29 (ID3 tags) or http://www.matroska.org/technical/specs/subtitles/ssa.html (SSA headers). I'm not sure of the best solution. I'd quite like the ability to use arbitrary voices, e.g. to use the names/initials of the speaker rather than a number, or to use e.g. shouting in combination with CSS :before { content 'Shouting: ' } or similar to adapt the display for different audiences (accessibility, basically). I agree. I think we can go back to usingspan and @class and @id and that would solve it all. I guess this is in support of Henri's proposal of parsing the cue using the HTML fragment parser (same as innerHTML)? That would be easy to implement, but how do we then mark up speakers? Using span class=narrator/span around each cue is very verbose. HTML isn't very good for marking up dialog, which is quite a limitation when dealing with subtitles... I actually think that the span @class mechanism is much more flexible than what we have in WebSRT right now. If we want multiple speakers to be able to speak in the same subtitle, then that's not possible in WebSRT. It's a little more verbose in HTML, but not massively. We might be able to add a special markup similar to the [timestamp] markup that Hixie introduced for Karaoke. This is beyond the innerHTML parser and I am not sure if it breaks it. But if it doesn't, then maybe we can also introduce a [voice] marker to be used similarly? * there is no means to identify which parser is required in the cues (is it plain text, minimal markup, or anything?) and therefore it is not possible for an application to know how it should parse the cues. All the types that are actually for visual rendering are parsed in the same way, aren't they? Of course there's no way for non-browsers to know that metadata tracks aren't interesting to look at as subtitles, but I think showing the user the garbage is a quicker to communicate that the file isn't for direct viewing than hiding the text or similar.
[whatwg] Fwd: Discussing WebSRT and alternatives/improvements
Hi Philip, On Sat, Aug 7, 2010 at 1:50 AM, Philip Jägenstedt phil...@opera.com wrote: If @profile should have any influence on the parser it sounds like this isn't actually XML at all. In particular, the HTML would have to be well-formed XML, but would still end up in the null namespace. Yeah, you are right - I suppose I was trying to imitate the flexibility of WebSRT there with an anything option. I guess simply cloning the child nodes of cue and changing their namespace to before inserting them into an iframe-like document might work, but would be quite odd, I think you'll agree. Yes, it's no different to WebSRT in that respect. * there is a possibility to provide script that just affects the time-synchronized text resource I agree that some metadata would be useful, more on that below. I'm not sure why we would want to run scripts inside the text document, though, when that can be accomplished by using the TimedTrack API from the containing page. Scripts inside a timed text document would only be useful for applications that use the track not in conjunction with a Web page. The cue elements have a start and end time attribute and contain innerHTML, thus there is already parsing code available in Web browsers to deal with this content. Any Web content can be introduced into a cue and the Web browsers will already be able to render it. Yes, but if the HTML parser can't be used for all of WMML, it makes the parser quite odd, being neither XML or HTML. I think that realistically the best way to make an XML-like format is to simply use XML. OK. Then everything that's not supposed to be parsed inside a cue would be escaped. I guess that works, too. 2. There is a natural mapping of WebSRT into in-band text tracks. Each cue naturally maps into a encoding page (just like a WMML cue does, too). But in WebSRT, because the setup information is not brought in a hierarchical element surrounding all cues, it is easier to just chuck anything that comes before the first cue into an encoding header page. For WMML, this problem can be solved, but it is less natural. I really like the idea of letting everything before the first timestamp in WebSRT be interpreted as the header. I'd want to use it like this: # author: Fan Subber # voices: 1 Boy # 2 Girl 01:23:45.678 -- 01:23:46.789 1 Hello 01:23:48.910 -- 01:23:49.101 2 Hello It's not critical that the format of the header be machine-readable, but we could of course make up a key-value syntax, use JSON, or something else. I disagree. I think it's absolutely necessary that the format of the header be machine-readable. Just like EXIF in images is machine readable or ID3 in MP3 is machine-readable. It would be counter-productive not to have it machine-readable, in particular useless to archiving and media management solutions. I'm not sure of the best solution. I'd quite like the ability to use arbitrary voices, e.g. to use the names/initials of the speaker rather than a number, or to use e.g. shouting in combination with CSS :before { content 'Shouting: ' } or similar to adapt the display for different audiences (accessibility, basically). I agree. I think we can go back to usingspan and @class and @id and that would solve it all. 4. It's a light-weight format in that it is not very verbose. It is nice for hand-authoring if you don't have to write so much. This is particularly true for the simple case. E.g. if new-lines that you author are automatically kept as newlines when interpreted. The drawbacks here are that as soon as you include more complicated markup into the cues (e.g. HTML markup or a SVG image), you're not allowed to put empty lines into it because they have a special meaning. So, while it is true that the number of characters for WebSRT will always be less than for any markup-based format, this may be really annoying in any of the cases that need more than plain text. It would be easy to just let the parser consume all lines until the next timestamp, but do you really want to separate two lines with a blank line? If the two lines aren't really related, one could instead have two cues with different vertical positioning. In marked-up content for readability I would at least not want every newline to impose a new display line. But I suppose since it's of kind metadata anyway, that wouldn't happen. So, I see - it's not such a big issue. Point 2 is possible in WMML through encoding all outer markup in a header and the cues in the data packets. To be clear, this would be a new codec type for the container, since I'm not aware of any that allow stating that the cue text is HTML. The same is true of WebSRT, muxing it into e.g. WebM would require the ability to express the kind from track kind=captions (although in practice such metadata in binary files ends up almost always being incorrect). All text tracks that are encoded