Re: XML and Unicode interoperability comes before HTML or even SGML

Philippe Verdy Sun, 15 Aug 2004 14:08:50 -0700

From: "Doug Ewell" <[EMAIL PROTECTED]>
> W3C still maintains a distinction between HTML and XHTML, and still
> offers both specifications.on its site.


And Unicode still publishes its previous versions too.
And even the RFC Editor publishes deprecated RFCs on its web site too
(www.rfc-editor.org is the official publication web site, even if many RFCs
are still hosted on the ietf.org web site as the IETF was at the origin of
most RFCs).

> HTML is not deprecated.

I did not say that. I just said that XHTML is the current recommandation by
the W3C, and HTML 4.01 will remain documented even if it is later officially
deprecated.

This said, HTML 4.01 is still the most used specification in
implementations, even if the current browsers will behave correctly with
XHTML that offers a very good backward compatibility: this means that in
practice, there's no reason why authors should continue to use HTML 4.01 for
their documents.

The only problem is for users of WYSIWIG HTML editors, that often do not
comply with XHTML requirements. For example, only Frontpage in its 2003
version allows generating XHTML conformant documents, but it does not do it
by default: the designer must still use an explicit command to reformat its
document with a XML conformant syntax, and there's still no check of the
document to see if it will validate against a specific XHTML DTD or schema.

-- For various reasons, authors still need to be allowed to generate legacy
HTML elements like <center> or <applet> even if they are not part of the
loosest XHTML schema, as legacy browsers still won't recognize blocks
centered with <div align="center> elements or applet referenced by <object>
(there are still disagreements between implementations about how external
object types should be designated.)

So in practice, XHTML 1.1 (with its strict but modular and extensible
schema) is a design goal for the future (when standard modules will be
developped and agreed between browser vendors), but XHTML 1.0 with its
"loose" schema offers an excellent interoperability with the benefit of a
full XML-conformance. And if authors don't care about conformance with a
specific XHTML schema version, they can still use the legacy elements they
want within a XML-conformant document, and label them with a "text/html"
MIME type (they just need to not reference the XHTML 1.0 or 1.1 standard DTD
in their DOCTYPE declaration, or they can reference their own DTD that
allows validating their documents).

The most important thing is not which precise schema they will use in their
document, but the fact that they have prepared their documents so that thay
can be accepted by standard and simpler XML parsers (HTML parsers are really
huge, full of hacks when trying to mimic the interpretation bugs of legacy
browsers, difficult to maintain, and contain too many bugs or
interoperability problems). I also don't know any HTML 4.01 parser that
effectively fully respects the HTML 4.01 specification, and I think that any
implementation that would try to do that would not render many web sites
designed either for Internet Explorer or Netscape 4 (and many websites still
don't work correctly with Mozilla-based browsers, unless the websites uses
many browser detection scripts and server-side dynamic code generation). --

The main object of my message was to warn Unicode that the Technical Report
about interoperability of XML and Unicode has not been reviewed since the
recent changes in Unicode 4.0.1 with the inclusion of ZW(N)J within
combining sequences. May be there's some work in progress at the W3C or in a
technical commitee to make the necessary changes in this UTR, but for now
the changes in clauses D14 and D17 create new unexpected interoperability
problems with XML. Solving these problems for XML will help solve at the
same time the problem in XHTML and HTML...

Re: XML and Unicode interoperability comes before HTML or even SGML

Reply via email to