Re: [Wikitech-l] Proposal: switch to HTML 5

Aryeh Gregor Tue, 07 Jul 2009 06:39:24 -0700

On Tue, Jul 7, 2009 at 2:37 AM, Remember the
dot<rememberthe...@gmail.com> wrote:
> That page clearly says that there will be an XHTML 5. XHTML is not going
> away.

By "XHTML" I meant "the family of standards including XHTML 1.0, 1.1,
2.0, etc.".  XHTML 5 is identical to HTML 5 except with a different
serialization.  Practically speaking, however, it looks like no one
will use XHTML 5 either, because it's impossible to deploy on the
current web.  (See below.)  As far as I can tell, it was thrown in as
a sop to XML fans, on the basis that it cost very little to add it to
the spec (given the definition in terms of DOM plus serializations),
without any expectation that anyone will use it in practice.

> What's to prevent a malicious user from manually posting an invalid
> submission? If there are no server-side checks, will the servers crash?

Obviously there will be server-side checks as well!  This will just
serve to inform the user immediately that they're missing a required
field, without having to wait for the server or use JavaScript.

> Why be cruel to our bot operators? XHTML is simpler and more consistent than
> tag soup HTML, and it's a lot easier to find a good XML parser than a good
> HTML parser.

Because it will make the markup easier to read and write for humans,
and smaller.  Things like leaving off superfluous closing elements do
not make for "tag soup".  One of the great features of HTML 5 is that
it very carefully defines the text/html parsing model in painstaking
backward-compatible detail.  For example, the description of unquoted
attributes is as follows:

"The attribute name, followed by zero or more space characters,
followed by a single U+003D EQUALS SIGN character, followed by zero or
more space characters, followed by the attribute value, which, in
addition to the requirements given above for attribute values, must
not contain any literal space characters, any U+0022 QUOTATION MARK
(") characters, U+0027 APOSTROPHE (') characters, U+003D EQUALS SIGN
(=) characters, U+003C LESS-THAN SIGN (<) characters, or U+003E
GREATER-THAN SIGN (>) characters, and must not be the empty string.

"If an attribute using the unquoted attribute syntax is to be followed
by another attribute or by one of the optional U+002F SOLIDUS (/)
characters allowed in step 6 of the start tag syntax above, then there
must be a space character separating the two."
http://dev.w3.org/html5/spec/Overview.html#attributes

Given that browsers need to implement all these complicated algorithms
anyway, there's no reason to prohibit the use of convenient shortcuts
for authors.  They're absolutely well-defined, and even if they're
more complicated for machines to parse, they're easier for humans to
use than the theoretically simpler XML rules.

Anyway.  Bots should not be scraping the site.  They should be using
the bot API, which is *vastly* easier to parse for useful data than
any variant of HTML or XHTML.  We could use this as an opportunity to
push bot operators toward using the API -- screen-scraping has always
been fragile and should be phased out anyway.  Bot operators who
screen-scrape will already break on other significant changes anyway;
how many screen-scrapers will keep working when Vector becomes the
default skin?

So I view the added difficulty of screen-scraping as a long-term side
benefit of switching to HTML 5, like validation failures for
presentational elements.  It makes behavior that was already
undesirable more *obviously* undesirable.

Clearly we can't break all the bots, though.  So try breaking XML
well-formedness.  If there are only a few isolated complaints, go
ahead with it.  If it causes large-scale breakage, revert and tell all
the bot operators to switch to the API, then try again in a few months
or a year.  Or when we enable Vector, which will probably break all
the bots anyway.

> So, while I see some benefit to switching to HTML 5, I'd prefer to use XHTML
> 5 instead.

XHTML 5, by definition, must be served under an XML MIME type.
Anything served as text/html is not XHTML 5, and is required to be an
HTML (not XHTML) serialization.  We cannot serve content under
non-text/html MIME types, because that would break IE, so we can't use
XHTML 5.  Even if we could, it would still be a bad idea.  In XHTML 5,
as in all XML, well-formedness errors are fatal.  And we can't ensure
that well-formedness errors are impossible without rewriting a lot of
the parser *and* UI code.

We can, however, serve HTML 5 that happens to also be well-formed XML.
 This will allow XML parsers to be used, and is what I propose we do
to start with.

On Tue, Jul 7, 2009 at 2:48 AM, Gregory Maxwell<gmaxw...@gmail.com> wrote:
> What do you think we're doing now? A jpeg 'poster' is displayed. When
> the user clicks the poster is replaced by the appropriate playback
> mechanism.

I'm confused.  What we're currently doing (correct me if I'm wrong) is
displaying a JPEG <img> as a poster, and replacing it via JavaScript
with the appropriate content when it's clicked.  What we should do,
ideally, is use something like <video src=foo.ogg poster=bar.jpg>,
which will cause the poster to be displayed in place of the video on
conformant browsers (including Firefox 3.6, but not 3.5).  Of course,
the <img> can be put in the fallback content for the <video>.

> I said it needed to be weighed, not that the weighing would come out
> any particular way.  I'm a fan of using Video natively. The fact that
> it makes save-page work the way it should is really great.

Okay, great.

> I'm not sure how you think it currently works but there is currently
> zero need to load cortado for HTML5 supporting browsers.

I was probably confused about what "Cortado" is -- apparently it's
only the Java-based player, not the whole JavaScript framework?  I
never looked into our implementation of this very much.  Anyway, the
point is we won't have to load the JavaScript logic even if the user
does have JavaScript enabled, which is a plus.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Proposal: switch to HTML 5

Reply via email to