Julian Reschke wrote:
Karl Dubost wrote:
... # PRODUCING BROKEN XML

The fact is that many atom feeds are broken for many reasons.

* edited by hand * created by templating tools which are not XML
producers * mixing content from different sources (html, db, xml)
with different encodings

It means when designing an atom feed consumer, implementers are
forced to recover the broken content to be able to make it usable
by the crowd (social impact). Second part of the postel laws "Be
liberal in what you accept". ...

Are you *really* sure about that? My understanding is that there are
 popular Atom consumers that require proper XML (except for the
RFC3023 issue), and that falling back to handle broken XML is
actually not needed (opposed to RSS).

Almost all violate (as it is needed for compatibility):

It is a fatal error if an XML entity is determined (via default,
encoding declaration, or higher-level protocol) to be in a certain
encoding but contains byte sequences that are not legal in that
encoding.

Quite a lot of feed readers use identical processors for both Atom and RSS though, and I imagine that a lot don't want to have one processor for each, so if you really want to be strict for Atom you probably have to convince people that it is in their interest to be strict for RSS (and for any commercial product, I expect the cost of poorer compatibility is greater than that gained by being strict).

Probably the only thing really needed for RSS but not needed for Atom is predefined entities (that were present in RSS 0.91 (Netscape)), which arguably should be solved just by increasing the number of predefined entities in XML.

Out of incidental interest, I did try shipping a release of SimplePie (which, combined with downstream users, has millions of users) which was strict with character encodings, but that turned out quite quickly to be unworkable in the real web. It, to this day, is strict with entities, and that causes around one bug report/support issue per month. I have plenty of occasions been tempted to prefix all documents with a DOCTYPE containing the entities present in RSS 0.91 (Netscape), though always found some technical reason to not implement it due to implementation complexity.

--
Geoffrey Sneddon — Opera Software
<http://gsnedders.com/>
<http://www.opera.com/>

Reply via email to