Re: ElementTree, XML and Unicode -- C0 Controls

2006-12-11 Thread Sébastien Boisgérault
On Dec 11, 4:51 pm, "Fredrik Lundh" <[EMAIL PROTECTED]> wrote: > Sébastien Boisgérault wrote: > > Could anyone comment on the rationale behind > > the current behavior ? Is it a performance issue, > > the search for non-valid unicode code points being > > too expensive ? > the default serializer

Re: ElementTree, XML and Unicode -- C0 Controls

2006-12-11 Thread Fredrik Lundh
Sébastien Boisgérault wrote: > Could anyone comment on the rationale behind > the current behavior ? Is it a performance issue, > the search for non-valid unicode code points being > too expensive ? the default serializer doesn't do any validation or well-formedness checks at all; it assumes tha

ElementTree, XML and Unicode -- C0 Controls

2006-12-11 Thread Sébastien Boisgérault
Hi all, The unicode code points in the -001F range -- except newline, tab, carriage return -- are not legal XML 1.0 characters. Attempts to serialize and deserialize such strings with ElementTree will fail: >>> elt = Element("root", char=u"\u") >>> xml = tostring(elt) >>> xml '' >>> from