Re: html escape sequences

2005-03-18 Thread Will McGugan
Leif K-Brooks wrote: Will McGugan wrote: I'd like to replace html escape sequences, like   and ' with single characters. Is there a dictionary defined somewhere I can use to replace these sequences? How about this? import re from htmlentitydefs import name2codepoint _entity_re = re

Re: html escape sequences

2005-03-18 Thread Leif K-Brooks
Will McGugan wrote: I'd like to replace html escape sequences, like   and ' with single characters. Is there a dictionary defined somewhere I can use to replace these sequences? How about this? import re from htmlentitydefs import name2codepoint _entity_re = re.compile(r&

html escape sequences

2005-03-18 Thread Will McGugan
Hi, I'd like to replace html escape sequences, like   and ' with single characters. Is there a dictionary defined somewhere I can use to replace these sequences? Thanks, Will McGugan -- http://mail.python.org/mailman/listinfo/python-list

Re: converting html escape sequences to unicode characters

2004-12-10 Thread Craig Ringer
On Fri, 2004-12-10 at 16:09, Craig Ringer wrote: > On Fri, 2004-12-10 at 08:36, harrelson wrote: > > I have a list of about 2500 html escape sequences (decimal) that I need > > to convert to utf-8. Stuff like: > > I'm pretty sure this somewhat horrifying code doe

Re: converting html escape sequences to unicode characters

2004-12-10 Thread Craig Ringer
On Fri, 2004-12-10 at 08:36, harrelson wrote: > I have a list of about 2500 html escape sequences (decimal) that I need > to convert to utf-8. Stuff like: I'm pretty sure this somewhat horrifying code does it, but is probably an example of what not to do: >>> escapeseq = &#x

Re: converting html escape sequences to unicode characters

2004-12-09 Thread Kent Johnson
harrelson wrote: I have a list of about 2500 html escape sequences (decimal) that I need to convert to utf-8. Stuff like: 비 행 기 로 보 낼 거 에 요 내 면 금 이 얼 마 지 잠 Anyone know what the decimal is representing? It doesn't seem to equate to a unicode codepoint... In well-formed HTML (!) these shou

converting html escape sequences to unicode characters

2004-12-09 Thread harrelson
I have a list of about 2500 html escape sequences (decimal) that I need to convert to utf-8. Stuff like: 비 행 기 로 보 낼 거 에 요 내 면 금 이 얼 마 지 잠 Anyone know what the decimal is representing? It doesn't seem to equate to a unicode codepoint... culley -- http://mail.python.org/mailman/lis