On Mon, Jul 16, 2007 at 07:31:01PM +0200, Jakob Schroeter wrote:
> Hi,
> 
> Apparantly there is a number of software packages that generates invalid 
> XMPP. 
> I've seen at least unescaped ' and " in attribute values and character data, 
> respectively.
> 
> http://www.xmpp.org/rfcs/rfc3920.html#xml states that an XMPP implementation 
> must not generate such unescaped characters, and when it "receives such 
> restricted XML data, it MUST ignore the data".

You have to distinguish 'bad XML' and 'restricted XML' here.
Unescaped or bad XML should lead to a disconnect. If you receive
parts of the 'restricted XML', that means XML that is:

   * comments (as defined in Section 2.5 of [XML] (Bray, T., Paoli, J.,
     Sperberg-McQueen, C., and E. Maler, “Extensible Markup Language (XML) 1.0 
(2nd
     ed),” October 2000.))
   * processing instructions (Section 2.6 therein)
   * internal or external DTD subsets (Section 2.8 therein)
   * internal or external entity references (Section 4.2 therein) with the
     exception of predefined entities (Section 4.6 therein)
   * character data or attribute values containing unescaped characters that
     map to the predefined entities (Section 4.6 therein); such characters MUST 
be
     escaped

> So far I just throw a parse error and disconnect the stream (when I 
> implemented that, I never thought this would actually happen), but people 
> complain about that. Also, that makes the receiving client look bad.

If you see a component or server that sends bad XML you should file a bugreport
against them. Or inform the administrator to upgrade.

Of course it looks bad to the user if the client bails out with "Got bad XML,
disconnecting you from your precious contacts and ruining your IM flirt with
that nice girl."

But there is no other choice, or your client even displays corrupted data to
the user, which would be even worse.


Robin

Reply via email to