Re: [SLUG] Re: Why XML bites and why it is NOT a markup language

telford Thu, 09 Jun 2005 21:35:49 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, Jun 10, 2005 at 11:04:07AM +1000, Matthew Palmer wrote:


> The problem with XML isn't that it's a crap language, it's that people are
> very poor at following instructions.  When a spec says "thou MUST do it this
> way", instead of doing it this way, people think "that's not important" and
> don't do it.

People are the constant part of the equation here. What you say is correct
and also immutable.

People might happen to mostly follow a spec if that spec is dead simple and
following it is also an easy thing to do. The XML specification is overly
complex, difficult to understand and there are lots of subtle ways to get
things wrong... end result is that it is a reasonable expectation that most
of the things you get with a "*.xml" filename will not exactly conform to
specification. If they do conform to specification then most likely at some
random future date someone will press the single tick key and what you
thought was working will fall in a heap.
 
> I'm not sure whether the problem is basic human nature,

No, the problem is a refusal to work within the confines of basic
human nature.

> or because we've
> been conditioned by so many really bong specs to ignore anything that
> doesn't make immediate sense to us...

And that too.

> As for the comparison with HTML, web browsers have been written to accept
> random garbage and try and make something useful out of it because that's
> what the web consists of.

Correct... and that's what makes HTML successful. The whole "world wide web"
thing simply would not have happened if we started out with something as
strict and breakable as XML.

> While it would be theoretically possible to do a
> similar thing with XML, it's a lot harder because you can "guess" what to do
> with bad HTML because of the limited use-case of HTML -- describing a web
> page.  For XML it's a lot harder, because you can't make any assumptions
> about what the meaning of the data is that you're parsing.

Then we need to accept that XML is not particularly useful and we need to
start looking for something better. I'd like to coin the name "RML" which
stands for "Robust Markup Language" which should have the following
desirable properties:

  * stream-oriented construction

  * byte-oriented construction (no 16 bit encodings at all)

  * supports arbitrary tags

  * supports parametric tags

  * never allow tags inside a tag definition

  * NO guarantee of tags making a perfect tree (but parser can provide
    information about tree or partial-tree structures if they exist)

  * when tags are all next to one another, ordering is NOT important
    (thus italic/bold is the same as bold/italic)

  * at most one parameter per tag and not named parameters
    (because named parameters bend your head and get very complex and
    require special syntax and further because it is always better to
    introduce a new tag than introduce a new named parameter)

  * supports guarantee of resynchronisation to tag boundary after an
    arbitrary seek into the file (scanning forwards or backwards) and
    something that "seems to be" a tag boundary always IS a tag boundary

  * case insensitive tag matching (for English at least plus any other
    language that sensibly defines mixed case)

  * damaged files can be recovered by an automatic process at least to
    the extent that lost data is proportional to the amount of damage

  * don't use closing tags at all, instead use the single parameter of
    the parametric tags to "update" that type of tag. e.g.:
 
    <bold> blah </bold> blurg

    is not good because knowing the font of "blurg" requires information
    going back an arbitrary number of tags earlier.

 
    <font-weight=bold> blah <font-weight=normal> blurg

    is much better because scanning backwards until you hit a <font-weight>
    tag will guarantee you have a full understanding of this parameter.
    In other words, you don't need to parse every document all the way
    from the beginning and thus large documents become managable

  * non-ascii encodings are passed cleanly up to the application
    level which can apply whatever translation it feels like doing
    (transation libraries might be an optional overlay after
    parsing is complete)

  * non-ascii encodings can never break the basic tag structure,
    so the parser can detect interesting encoding anomalies but can
    still continue to scan the file



That's my wishlist... probably won't get done this afternoon but at least
it is down on record so that when everyone is old and grey and some young
guy says "I've invented this new tagging system that fixes the XML 
nightmare that has plagued the world for so long" I can give him a link 
to the Slug archives and say "told you so".

        - Tel  ( http://bespoke.homelinux.net/ )
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iQIVAwUBQqkYvcfOVl0KFTApAQLAQQ/9EyQ592wj1SpcJnaR9NG+6r5b3ZymJG7w
lbo1wmMSQurCIoVqv2xsI8tQGhRAR4E2TFscPJq4Dm0oOq6BcDuRXD1/2CGvEX4G
4nOpTkxfTEGxv/a3K+bpnlGorY1opvsGZXcyevvmrddMlkf8yN9K5HqRVG/l3QUW
JVcYslfUCib7ZTfr0XlAjljGWE4gmeSXDfCm4IclQKenXSpTZUN49uDCMM2ozN8j
eEWDEhZkPU8XOiGisO78ZQCnUgHPJeRnReBxtf5gOg4CJldANhHBo1nfD7kVVPC5
taacYm/QPON176ho+e64aI0LfXV84JeI0yRHoM0Os1xEsqBPXJ5Jk52aqyD/Y0yv
wrN1ztE2aWCrT8gghCiBr/9o8dyLetdGLbliKP0uH0EN5N0Q1W4tXFu8DQO9se3N
fdxAP17FCszjSu/Rt5HH7NEhG756pg3LVAydOPYoMkuHrw63/fX3VUAO6NtMhs+w
JgNDFAwgFDhBhAJOQfTJKGN0ivsCLJ0Vu7GfON6ZFnySyAPnHPa9nVfjB6Oqmqxv
jfYalhiBu8Y9kvr/tJfi95+azUsr1b0zESyiGSrtiGj7aZ/tsyuxG7d0qURBaBWB
2W5lJ1FvY9r6Cers8YowLRigZvnYElbADBCIVX4kxSKLyNjra7UbLzielqjXSVi5
1muBQIdO9AM=
=ucoz
-----END PGP SIGNATURE-----
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Re: [SLUG] Re: Why XML bites and why it is NOT a markup language

Reply via email to