On Fri, 22 Oct 2010 11:45:24 +0200, Simon Pieters <sim...@opera.com> wrote:
On Fri, 22 Oct 2010 11:21:44 +0200, Silvia Pfeiffer
<silviapfeiff...@gmail.com> wrote:
Since the attributes in <track> are a hint, probably what is available
in the file should overrule what is in the <track> attributes. It is
the same for the @charset attribute, which is overruled to utf-8 for
WebSRT IIRC.
No, charset="" overrules the encoding for WebSRT per spec.
We should just remove charset="" from the spec.
* add a means to add comments
e.g.
// Lines starting with // are comments
So far the web two comment syntaxes: <!-- SGML style --> and /* CSS
style
*/, so if we need comments I think we should pick one of these.
Actually there are three more in javascript:
// line comment
<!-- line comment
--> line comment
http://wiki.whatwg.org/wiki/Web_ECMAScript#HTML_comments
I'm not fussed. I thought your analysis pointed to //, which is also
nicer because it takes the full line into account without a need for
end tags. Also, it is common from C++ and other programming languages.
But I don't really mind - we just need a decision and reasons for why.
Using <!-- --> is a bad idea since the WebSRT syntax already uses -->. I
don't see the need for multiline comments.
Right. If we must have comments I think I'd prefer /* ... */ since both
CSS and JavaScript have it, and I can't see that single-line comments will
be easier from a parser perspective.
Anyway, I agree that at least a magic header like "WebSRT" is needed
because
of the horrors of legacy SRT parsing.
I don't see why we can't just consume the legacy and support it in
WebSRT. Part of the point with WebSRT is to support the legacy. If we
don't want to support the legacy, then the format can be made a lot
cleaner.
Did you read
<http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-October/028799.html>
and look at <http://ale5000.altervista.org/subtitles.htm>?
Do you think it's a good idea to make WebSRT an extension of ale5000-SRT?
My opinion is that it's not a very good idea, which of course we can
simplify some aspects of the format. For example, we don't need to allow
both , and . as the millisecond separator, and the time parsing in general
can be made more sane.
Breaking SRT compat means that we can
go back to requiring UTF-8 as the encoding. However, UTF-8 does
complicate
the magic header a bit due to the possibility of a BOM [1]. While it
would
be nice to forbid the use of a BOM, I expect we'd then see lots of
frustration from authors who's editors automatically insert it...
[1] http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
I'm happy to enforce UTF-8 on WebSRT. The @charset can work for other
formats. I didn't know about the BOM problem - but having read it, I
would think it makes sense to forbid it. What tools do and how they
deal with erroneous files is a different matter.
Forbidding it would be the frustration. Consider editing a WebSRT file
in Notepad, and then suddenly it doesn't work anymore. Instead we should
allow the BOM. (WebSRT already allows the BOM.)
This means that it's tricker to use "WebSRT" as the magic bytes, but I
agree it's probably the better trade-off.
--
Philip Jägenstedt
Core Developer
Opera Software