Re: Some random ideas around (broken) XML

Geoffrey Sneddon Wed, 18 Nov 2009 02:31:16 -0800

Julian Reschke wrote:

Karl Dubost wrote:

... # PRODUCING BROKEN XML


The fact is that many atom feeds are broken for many reasons.

* edited by hand * created by templating tools which are not XML
producers * mixing content from different sources (html, db, xml)
with different encodings

It means when designing an atom feed consumer, implementers are
forced to recover the broken content to be able to make it usable
by the crowd (social impact). Second part of the postel laws "Be
liberal in what you accept". ...


Are you *really* sure about that? My understanding is that there are
 popular Atom consumers that require proper XML (except for the
RFC3023 issue), and that falling back to handle broken XML is
actually not needed (opposed to RSS).


Almost all violate (as it is needed for compatibility):

It is a fatal error if an XML entity is determined (via default,
encoding declaration, or higher-level protocol) to be in a certain
encoding but contains byte sequences that are not legal in that
encoding.

Quite a lot of feed readers use identical processors for both Atom andRSS though, and I imagine that a lot don't want to have one processorfor each, so if you really want to be strict for Atom you probably haveto convince people that it is in their interest to be strict for RSS(and for any commercial product, I expect the cost of poorercompatibility is greater than that gained by being strict).

Probably the only thing really needed for RSS but not needed for Atom ispredefined entities (that were present in RSS 0.91 (Netscape)), whicharguably should be solved just by increasing the number of predefinedentities in XML.

Out of incidental interest, I did try shipping a release of SimplePie(which, combined with downstream users, has millions of users) which wasstrict with character encodings, but that turned out quite quickly to beunworkable in the real web. It, to this day, is strict with entities,and that causes around one bug report/support issue per month. I haveplenty of occasions been tempted to prefix all documents with a DOCTYPEcontaining the entities present in RSS 0.91 (Netscape), though alwaysfound some technical reason to not implement it due to implementationcomplexity.


--
Geoffrey Sneddon — Opera Software
<http://gsnedders.com/>
<http://www.opera.com/>

Re: Some random ideas around (broken) XML

Reply via email to