-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, Jun 10, 2005 at 11:04:07AM +1000, Matthew Palmer wrote:
> The problem with XML isn't that it's a crap language, it's that people are > very poor at following instructions. When a spec says "thou MUST do it this > way", instead of doing it this way, people think "that's not important" and > don't do it. People are the constant part of the equation here. What you say is correct and also immutable. People might happen to mostly follow a spec if that spec is dead simple and following it is also an easy thing to do. The XML specification is overly complex, difficult to understand and there are lots of subtle ways to get things wrong... end result is that it is a reasonable expectation that most of the things you get with a "*.xml" filename will not exactly conform to specification. If they do conform to specification then most likely at some random future date someone will press the single tick key and what you thought was working will fall in a heap. > I'm not sure whether the problem is basic human nature, No, the problem is a refusal to work within the confines of basic human nature. > or because we've > been conditioned by so many really bong specs to ignore anything that > doesn't make immediate sense to us... And that too. > As for the comparison with HTML, web browsers have been written to accept > random garbage and try and make something useful out of it because that's > what the web consists of. Correct... and that's what makes HTML successful. The whole "world wide web" thing simply would not have happened if we started out with something as strict and breakable as XML. > While it would be theoretically possible to do a > similar thing with XML, it's a lot harder because you can "guess" what to do > with bad HTML because of the limited use-case of HTML -- describing a web > page. For XML it's a lot harder, because you can't make any assumptions > about what the meaning of the data is that you're parsing. Then we need to accept that XML is not particularly useful and we need to start looking for something better. I'd like to coin the name "RML" which stands for "Robust Markup Language" which should have the following desirable properties: * stream-oriented construction * byte-oriented construction (no 16 bit encodings at all) * supports arbitrary tags * supports parametric tags * never allow tags inside a tag definition * NO guarantee of tags making a perfect tree (but parser can provide information about tree or partial-tree structures if they exist) * when tags are all next to one another, ordering is NOT important (thus italic/bold is the same as bold/italic) * at most one parameter per tag and not named parameters (because named parameters bend your head and get very complex and require special syntax and further because it is always better to introduce a new tag than introduce a new named parameter) * supports guarantee of resynchronisation to tag boundary after an arbitrary seek into the file (scanning forwards or backwards) and something that "seems to be" a tag boundary always IS a tag boundary * case insensitive tag matching (for English at least plus any other language that sensibly defines mixed case) * damaged files can be recovered by an automatic process at least to the extent that lost data is proportional to the amount of damage * don't use closing tags at all, instead use the single parameter of the parametric tags to "update" that type of tag. e.g.: <bold> blah </bold> blurg is not good because knowing the font of "blurg" requires information going back an arbitrary number of tags earlier. <font-weight=bold> blah <font-weight=normal> blurg is much better because scanning backwards until you hit a <font-weight> tag will guarantee you have a full understanding of this parameter. In other words, you don't need to parse every document all the way from the beginning and thus large documents become managable * non-ascii encodings are passed cleanly up to the application level which can apply whatever translation it feels like doing (transation libraries might be an optional overlay after parsing is complete) * non-ascii encodings can never break the basic tag structure, so the parser can detect interesting encoding anomalies but can still continue to scan the file That's my wishlist... probably won't get done this afternoon but at least it is down on record so that when everyone is old and grey and some young guy says "I've invented this new tagging system that fixes the XML nightmare that has plagued the world for so long" I can give him a link to the Slug archives and say "told you so". - Tel ( http://bespoke.homelinux.net/ ) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) iQIVAwUBQqkYvcfOVl0KFTApAQLAQQ/9EyQ592wj1SpcJnaR9NG+6r5b3ZymJG7w lbo1wmMSQurCIoVqv2xsI8tQGhRAR4E2TFscPJq4Dm0oOq6BcDuRXD1/2CGvEX4G 4nOpTkxfTEGxv/a3K+bpnlGorY1opvsGZXcyevvmrddMlkf8yN9K5HqRVG/l3QUW JVcYslfUCib7ZTfr0XlAjljGWE4gmeSXDfCm4IclQKenXSpTZUN49uDCMM2ozN8j eEWDEhZkPU8XOiGisO78ZQCnUgHPJeRnReBxtf5gOg4CJldANhHBo1nfD7kVVPC5 taacYm/QPON176ho+e64aI0LfXV84JeI0yRHoM0Os1xEsqBPXJ5Jk52aqyD/Y0yv wrN1ztE2aWCrT8gghCiBr/9o8dyLetdGLbliKP0uH0EN5N0Q1W4tXFu8DQO9se3N fdxAP17FCszjSu/Rt5HH7NEhG756pg3LVAydOPYoMkuHrw63/fX3VUAO6NtMhs+w JgNDFAwgFDhBhAJOQfTJKGN0ivsCLJ0Vu7GfON6ZFnySyAPnHPa9nVfjB6Oqmqxv jfYalhiBu8Y9kvr/tJfi95+azUsr1b0zESyiGSrtiGj7aZ/tsyuxG7d0qURBaBWB 2W5lJ1FvY9r6Cers8YowLRigZvnYElbADBCIVX4kxSKLyNjra7UbLzielqjXSVi5 1muBQIdO9AM= =ucoz -----END PGP SIGNATURE----- -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html