On Wed, Mar 4, 2009 at 11:13 AM, Eric Dorsey <dors...@gmail.com> wrote: > I know, for example, that the > code means >, but what I don't know is > how to convert it in all my data to show properly? I
Feedparser returns the output in html only so except html tags and entities in the output. What you want is to Unescape HTML entities ( http://effbot.org/zone/re-sub.htm#unescape-html ) import feedparser import re, htmlentitydefs def unescape(text): def fixup(m): text = m.group(0) if text[:2] == "&#": # character reference try: if text[:3] == "&#x": return unichr(int(text[3:-1], 16)) else: return unichr(int(text[2:-1])) except ValueError: pass else: # named entity try: text = unichr(htmlentitydefs.name2codepoint[text[1:-1]]) except KeyError: pass return text # leave as is return re.sub("&#?\w+;", fixup, text) d = feedparser.parse('http://snipt.net/dorseye/feed') x=0 for i in d['entries']: print unescape(d['entries'][x].title) print unescape(d['entries'][x].summary) print x+=1 HTH, Senthil _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor