Re: Is it possible to consume UTF8 XML documents using xml.dom.pulldom?

2008-07-31 Thread Paul Boddie
On 30 Jul, 20:15, Peter Otten <[EMAIL PROTECTED]> wrote: > Paul Boddie wrote: > > Who wants to be first to submit a patch? ;-) > > And where? The sourceforge page says > > "PyXML is no longer maintained." The minidom code is in the standard library: http://svn.python.org/view/python/trunk/Lib/xml

Re: Is it possible to consume UTF8 XML documents using xml.dom.pulldom?

2008-07-30 Thread Peter Otten
Paul Boddie wrote: > Who wants to be first to submit a patch? ;-) And where? The sourceforge page says "PyXML is no longer maintained." Peter -- http://mail.python.org/mailman/listinfo/python-list

Re: Is it possible to consume UTF8 XML documents using xml.dom.pulldom?

2008-07-30 Thread Paul Boddie
On 30 Jul, 19:23, Peter Otten <[EMAIL PROTECTED]> wrote: > > I'm on Kubuntu 7.10 and see the same error as Simon. The problem is in the > minidom.CharacterData class which has the following method > >     def __repr__(self): >         data = self.data >         if len(data) > 10: >             dotd

Re: Is it possible to consume UTF8 XML documents using xml.dom.pulldom?

2008-07-30 Thread Stefan Behnel
Simon Willison wrote: > Follow up question: what's the best way of incrementally consuming XML > in Python that's character encoding aware? iterparse(), as implemented in (c)ElementTree and lxml. Note that ElementTree and cElementTree are part of Python 2.5, in the xml.etree package. > I have a

Re: Is it possible to consume UTF8 XML documents using xml.dom.pulldom?

2008-07-30 Thread Peter Otten
Paul Boddie wrote: > On 30 Jul, 18:17, Simon Willison <[EMAIL PROTECTED]> wrote: >> >> Some very useful people in #python on Freenode pointed out that my bug >> occurs because I'm trying to display things interactively in the >> console. Saving to a variable instead fixes the problem. > > What's

Re: Is it possible to consume UTF8 XML documents using xml.dom.pulldom?

2008-07-30 Thread Paul Boddie
On 30 Jul, 18:17, Simon Willison <[EMAIL PROTECTED]> wrote: > > Some very useful people in #python on Freenode pointed out that my bug > occurs because I'm trying to display things interactively in the > console. Saving to a variable instead fixes the problem. What's strange about that is how the

Re: Is it possible to consume UTF8 XML documents using xml.dom.pulldom?

2008-07-30 Thread Simon Willison
On Jul 30, 4:59 pm, Simon Willison <[EMAIL PROTECTED]> wrote: > I just tried it out on Python 2.4.2 on an Ubuntu machine and it worked > fine! I guess this must be an OS X Python bug. How absolutely > infuriating. Some very useful people in #python on Freenode pointed out that my bug occurs becaus

Re: Is it possible to consume UTF8 XML documents using xml.dom.pulldom?

2008-07-30 Thread Simon Willison
On Jul 30, 4:43 pm, Paul Boddie <[EMAIL PROTECTED]> wrote: > I can't reproduce this on Python 2.3.6 or 2.4.4 on RHEL 4. Instead, I > get the usual... > > ('CHARACTERS', ) I'm using Python 2.5.1 on OS X Leopard: $ python Python 2.5.1 (r251:54863, Feb 4 2008, 21:48:13) [GCC 4.0.1 (Apple Inc. build

Re: Is it possible to consume UTF8 XML documents using xml.dom.pulldom?

2008-07-30 Thread Paul Boddie
On 30 Jul, 16:32, Simon Willison <[EMAIL PROTECTED]> wrote: > I'm having a horrible time trying to get xml.dom.pulldom to consume a > UTF8 encoded XML file. Here's what I've tried so far: > > >>> xml_utf8 = """ > > Simon\xe2\x80\x99s XML nightmare > """>>> from xml.dom import pulldom > >>> parser =

Re: Is it possible to consume UTF8 XML documents using xml.dom.pulldom?

2008-07-30 Thread Simon Willison
Follow up question: what's the best way of incrementally consuming XML in Python that's character encoding aware? I have a very large file to consume but I'd rather not have to fall back to the raw SAX API. -- http://mail.python.org/mailman/listinfo/python-list

Is it possible to consume UTF8 XML documents using xml.dom.pulldom?

2008-07-30 Thread Simon Willison
I'm having a horrible time trying to get xml.dom.pulldom to consume a UTF8 encoded XML file. Here's what I've tried so far: >>> xml_utf8 = """ Simon\xe2\x80\x99s XML nightmare """ >>> from xml.dom import pulldom >>> parser = pulldom.parseString(xml_utf8) >>> parser.next() ('START_DOCUMENT', ) >>>