Dear Nice people

I've been using beautiful soup to filter the BBC's rss feed. However, recently the bbc have changed the feed and it is causing me problems with the pound(money) symbol. The initial error was "UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3'" which means that the default encoding can't process this (unicode) character. I was having simular problems with HTML characters appearing but I used a simple regex system to remove/substitute them to something suitable. I tried applying the same approach and make a generic regex patten (re.compile(u"""\u\[A-Fa-f0-9\]\{4\}""") but this fails because it doesn't follow the standard patten for ascii. I'm not sure that I 100% understand the unicode system but is there a simple way to remove/subsitute these non ascii strings?

Thanks for any help!

Andy
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to