On Sat, Jan 04, 2014 at 11:57:20PM -0800, Alex Kleider wrote: > Well, I've tried the xml approach which seems promising but still I get > an encoding related error. > Is there a bug in the xml.etree module (not very likely, me thinks) or > am I doing something wrong?
I'm no expert on XML, but it looks to me like it is a bug in ElementTree. It doesn't appear to handle unicode strings correctly (although perhaps it doesn't promise to). A simple demonstration using Python 2.7: py> import xml.etree.ElementTree as ET py> ET.fromstring(u'<xml>a</xml>') <Element 'xml' at 0xb7ca982c> But: py> ET.fromstring(u'<xml>á</xml>') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/xml/etree/ElementTree.py", line 1282, in XML parser.feed(text) File "/usr/local/lib/python2.7/xml/etree/ElementTree.py", line 1622, in feed self._parser.Parse(data, 0) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 5: ordinal not in range(128) An easy work-around: py> ET.fromstring(u'<xml>á</xml>'.encode('utf-8')) <Element 'xml' at 0xb7ca9a8c> although, as I said, I'm no expert on XML and this may lead to errors later on. > There's no denying that the whole encoding issue is still not completely > clear to me in spite of having devoted a lot of time to trying to grasp > all that's involved. Have you read Joel On Software's explanation? http://www.joelonsoftware.com/articles/Unicode.html It's well worth reading. Start with that, and then ask if you have any further questions. > Here's what I've got: > > alex@x301:~/Python/Parse$ cat ip_xml.py > #!/usr/bin/env python > # -*- coding : utf -8 -*- > # file: 'ip_xml.py' [...] > tree = ET.fromstring(xml) > root = tree.getroot() # Here's where it blows up!!! I reckon that what you need is to change the first line to: tree = ET.fromstring(xml.encode('latin-1')) or whatever the encoding is meant to be. -- Steven _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor