[issue6611] HTMLParser cannot deal with mixture of arbitrary data and character reference

2009-08-01 Thread bones7456
bones7456 bones7...@gmail.com added the comment: another fix way: and these three lines to the head of file: import sys reload(sys) sys.setdefaultencoding('utf8') -- nosy: +bones7456 ___ Python tracker rep...@bugs.python.org

[issue6611] HTMLParser cannot deal with mixture of arbitrary data and character reference

2009-08-01 Thread Liu DongMiao
Liu DongMiao liudongm...@gmail.com added the comment: i think this should not be a bug. as we dont know the encoding of str, so we cannt deal with str and unicode together. in my example, str is in utf-8, so i need to convert unicode to str in utf-8. i will takes bones' suggestion.

[issue6611] HTMLParser cannot deal with mixture of arbitrary data and character reference

2009-07-31 Thread Liu DongMiao
New submission from Liu DongMiao liudongm...@gmail.com: HTMLParser (Python 2.6.2) Cannot deal with mixture of arbitrary data and character reference. In line 365-373, replaceEntities(s) returns unichr(charref) in unicode, which cannot be a mixture with arbitrary data in str. A fix way: