Re: [Zope3-dev] Re: zope.tal.xmlparser.XMLParser() dislikes unicode

Philipp von Weitershausen Sun, 14 Jan 2007 10:37:37 -0800

On 14 Jan 2007, at 18:37 , Dieter Maurer wrote:

Philipp von Weitershausen wrote at 2007-1-14 14:59 +0100:
...
Traditionally, you parse an 8bit string, figure out its encoding(e.g.from <?xml encoding="utf-8"?> and return some representation ofthat XMLwith unicode data. That's why it's actually quite ok for XMLparsers to
only accept string data.
Parsing usually means rebuilding the structure from a text stringand *NOT*
encoding guessing or Unicode decoding.

Therefore, it is actually quite stupid for a parser
to try to encode an already decoded string (i.e. a Unicode string)
only that it can guess the encoding ;-)
A halfway intelligent parser would accept Unicode when it gets it
and concentrate on the remaining part of its task: either reporting
structural events or building a parse tree.

Yes, I agree. Unfortunately, expat isn't smart enough, which causedthis whole discussion.


_______________________________________________
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com

Re: [Zope3-dev] Re: zope.tal.xmlparser.XMLParser() dislikes unicode

Reply via email to