Re: ignoring chinese characters parsing xml file

2007-10-23 Thread Stefan Behnel
Fabian López wrote: Thanks Mark, the code is like this. The attrib name is the problem: from lxml import etree context = etree.iterparse(file.xml) for action, elem in context: if elem.tag == weblog: print action, elem.tag , elem.attrib[name],elem.attrib[url], The problem is

Re: ignoring chinese characters parsing xml file

2007-10-23 Thread limodou
On 10/23/07, Fabian López [EMAIL PROTECTED] wrote: Hi, I am parsing an XML file that includes chineses characters, like ^�u�u啖啖才是�w.���扉L锍才是�� or ヘアアイロン... The problem is that I get an error like: UnicodeEncodeerror:'charmap' codec can't encode characters in position The thing is that I

Re: ignoring chinese characters parsing xml file

2007-10-23 Thread limodou
On 10/23/07, Stefan Behnel [EMAIL PROTECTED] wrote: Fabian López wrote: Thanks Mark, the code is like this. The attrib name is the problem: from lxml import etree context = etree.iterparse(file.xml) for action, elem in context: if elem.tag == weblog: print action,

Re: ignoring chinese characters parsing xml file

2007-10-23 Thread Fabian López
Thanks, I have tried all you told me. It was an error on print statement. So I decided to catch the exception if I had an UnicodeEncodeError, that is, if I had chinese/japanese characters because they don't interest to me and it worked. The strip_asian function of Ryan didn't work well here, but

ignoring chinese characters parsing xml file

2007-10-22 Thread Fabian López
Hi, I am parsing an XML file that includes chineses characters, like ^ �u�u啖啖才是�w.���扉L锍才是�� or ヘアアイロン... The problem is that I get an error like: UnicodeEncodeerror:'charmap' codec can't encode characters in position The thing is that I would like to ignore it and parse all the characters

Re: ignoring chinese characters parsing xml file

2007-10-22 Thread Marc 'BlackJack' Rintsch
On Mon, 22 Oct 2007 21:24:40 +0200, Fabian López wrote: I am parsing an XML file that includes chineses characters, like ^ uu啖啖才是w.扉L锍才是 or ヘアアイロン... The problem is that I get an error like: UnicodeEncodeerror:'charmap' codec can't encode characters in position.. You say you are *parsing*

Re: ignoring chinese characters parsing xml file

2007-10-22 Thread Fabian López
Thanks Mark, the code is like this. The attrib name is the problem: from lxml import etree context = etree.iterparse(file.xml) for action, elem in context: if elem.tag == weblog: print action, elem.tag , elem.attrib[name],elem.attrib[url], elem.attrib[rssUrl] And the xml file like:

RE: ignoring chinese characters parsing xml file

2007-10-22 Thread Ryan Ginstrom
On Behalf Of Fabian Lopez like ^�u�u啖啖才是�w.���扉L锍才是�� or ヘアアイロン... The problem is that I get Just thought I'd point out here that the second string is Japanese, not Chinese. From your second post, it appears that you've parsed the text without problems -- it's when you go to print them out