On Mon, Jul 16, 2007 at 07:31:01PM +0200, Jakob Schroeter wrote: > Hi, > > Apparantly there is a number of software packages that generates invalid > XMPP. > I've seen at least unescaped ' and " in attribute values and character data, > respectively. > > http://www.xmpp.org/rfcs/rfc3920.html#xml states that an XMPP implementation > must not generate such unescaped characters, and when it "receives such > restricted XML data, it MUST ignore the data".
You have to distinguish 'bad XML' and 'restricted XML' here. Unescaped or bad XML should lead to a disconnect. If you receive parts of the 'restricted XML', that means XML that is: * comments (as defined in Section 2.5 of [XML] (Bray, T., Paoli, J., Sperberg-McQueen, C., and E. Maler, “Extensible Markup Language (XML) 1.0 (2nd ed),” October 2000.)) * processing instructions (Section 2.6 therein) * internal or external DTD subsets (Section 2.8 therein) * internal or external entity references (Section 4.2 therein) with the exception of predefined entities (Section 4.6 therein) * character data or attribute values containing unescaped characters that map to the predefined entities (Section 4.6 therein); such characters MUST be escaped > So far I just throw a parse error and disconnect the stream (when I > implemented that, I never thought this would actually happen), but people > complain about that. Also, that makes the receiving client look bad. If you see a component or server that sends bad XML you should file a bugreport against them. Or inform the administrator to upgrade. Of course it looks bad to the user if the client bails out with "Got bad XML, disconnecting you from your precious contacts and ruining your IM flirt with that nice girl." But there is no other choice, or your client even displays corrupted data to the user, which would be even worse. Robin