On Fri, 22 Oct 2010 11:45:24 +0200, Simon Pieters <sim...@opera.com> wrote:

On Fri, 22 Oct 2010 11:21:44 +0200, Silvia Pfeiffer <silviapfeiff...@gmail.com> wrote:

Since the attributes in <track> are a hint, probably what is available
in the file should overrule what is in the <track> attributes. It is
the same for the @charset attribute, which is overruled to utf-8 for
WebSRT IIRC.

No, charset="" overrules the encoding for WebSRT per spec.

We should just remove charset="" from the spec.

* add a means to add comments

e.g.
// Lines starting with // are comments

So far the web two comment syntaxes: <!-- SGML style --> and /* CSS style
*/, so if we need comments I think we should pick one of these.

Actually there are three more in javascript:

// line comment
<!-- line comment
--> line comment

http://wiki.whatwg.org/wiki/Web_ECMAScript#HTML_comments


I'm not fussed. I thought your analysis pointed to //, which is also
nicer because it takes the full line into account without a need for
end tags. Also, it is common from C++ and other programming languages.
But I don't really mind - we just need a decision and reasons for why.

Using <!-- --> is a bad idea since the WebSRT syntax already uses -->. I don't see the need for multiline comments.

Right. If we must have comments I think I'd prefer /* ... */ since both CSS and JavaScript have it, and I can't see that single-line comments will be easier from a parser perspective.

Anyway, I agree that at least a magic header like "WebSRT" is needed because
of the horrors of legacy SRT parsing.

I don't see why we can't just consume the legacy and support it in WebSRT. Part of the point with WebSRT is to support the legacy. If we don't want to support the legacy, then the format can be made a lot cleaner.

Did you read <http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-October/028799.html> and look at <http://ale5000.altervista.org/subtitles.htm>?

Do you think it's a good idea to make WebSRT an extension of ale5000-SRT? My opinion is that it's not a very good idea, which of course we can simplify some aspects of the format. For example, we don't need to allow both , and . as the millisecond separator, and the time parsing in general can be made more sane.

Breaking SRT compat means that we can
go back to requiring UTF-8 as the encoding. However, UTF-8 does complicate the magic header a bit due to the possibility of a BOM [1]. While it would
be nice to forbid the use of a BOM, I expect we'd then see lots of
frustration from authors who's editors automatically insert it...

[1] http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8

I'm happy to enforce UTF-8 on WebSRT. The @charset can work for other
formats. I didn't know about the BOM problem - but having read it, I
would think it makes sense to forbid it. What tools do and how they
deal with erroneous files is a different matter.

Forbidding it would be the frustration. Consider editing a WebSRT file in Notepad, and then suddenly it doesn't work anymore. Instead we should allow the BOM. (WebSRT already allows the BOM.)

This means that it's tricker to use "WebSRT" as the magic bytes, but I agree it's probably the better trade-off.

--
Philip Jägenstedt
Core Developer
Opera Software

Reply via email to