[Tutor] Removing GB pound symbols from Beautiful soup output

Andy Fri, 16 Jul 2010 07:50:52 -0700

Dear Nice people

I've been using beautiful soup to filter the BBC's rss feed. However,recently the bbc have changed the feed and it is causing me problemswith the pound(money) symbol. The initial error was "UnicodeEncodeError:'ascii' codec can't encode character u'\xa3'" which means that thedefault encoding can't process this (unicode) character. I was havingsimular problems with HTML characters appearing but I used a simpleregex system to remove/substitute them to something suitable.I tried applying the same approach and make a generic regex patten(re.compile(u"""\u\[A-Fa-f0-9\]\{4\}""") but this fails because itdoesn't follow the standard patten for ascii. I'm not sure that I 100%understand the unicode system but is there a simple way toremove/subsitute these non ascii strings?


Thanks for any help!

Andy
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Removing GB pound symbols from Beautiful soup output

Reply via email to