[issue3932] HTMLParser cannot handle '&' and non-ascii characters in attribute names

2011-12-18 Thread Ezio Melotti
Changes by Ezio Melotti : -- resolution: -> fixed stage: needs patch -> committed/rejected status: open -> closed type: behavior -> enhancement ___ Python tracker ___ ___

[issue3932] HTMLParser cannot handle '&' and non-ascii characters in attribute names

2011-12-18 Thread Roundup Robot
Roundup Robot added the comment: New changeset 978f45013c34 by Ezio Melotti in branch '2.7': #3932: suggest passing unicode to HTMLParser.feed(). http://hg.python.org/cpython/rev/978f45013c34 -- nosy: +python-dev ___ Python tracker

[issue3932] HTMLParser cannot handle '&' and non-ascii characters in attribute names

2011-11-28 Thread Ezio Melotti
Ezio Melotti added the comment: I'll change this in a doc issue then. Any suggestions about the wording? Adding "Passing unicode strings is suggested/advised/preferred." in the .feed() section is a bit vague, and mentioning the problem (with str it might break in some corner cases) while keep

[issue3932] HTMLParser cannot handle '&' and non-ascii characters in attribute names

2011-11-22 Thread Éric Araujo
Éric Araujo added the comment: +1 on refusing the temptation to guess and to be half-working for some cases by accident. -- ___ Python tracker ___ __

[issue3932] HTMLParser cannot handle '&' and non-ascii characters in attribute names

2011-11-14 Thread Ezio Melotti
Changes by Ezio Melotti : -- assignee: -> ezio.melotti ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://m

[issue3932] HTMLParser cannot handle '&' and non-ascii characters in attribute names

2011-11-06 Thread Ezio Melotti
Changes by Ezio Melotti : -- stage: -> needs patch Added file: http://bugs.python.org/file23621/issue3932-test.diff ___ Python tracker ___ ___

[issue3932] HTMLParser cannot handle '&' and non-ascii characters in attribute names

2011-11-06 Thread Ezio Melotti
Changes by Ezio Melotti : -- nosy: +eric.araujo ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.pyth

[issue3932] HTMLParser cannot handle '&' and non-ascii characters in attribute names

2011-11-06 Thread Ezio Melotti
Ezio Melotti added the comment: I'm not sure what is the best solution here. unescape uses a regex with replaceEntities as callback to replace the entities in attribute values. The problem is that replaceEntities currently returns unicode, and if unescape receives a str, an automatic coercion

[issue3932] HTMLParser cannot handle '&' and non-ascii characters in attribute names

2009-12-12 Thread Sérgio
Sérgio added the comment: the patch fix parsing in simple tag a with title with ?! and accents like this: -- nosy: +sergiomb2 ___ Python tracker ___ _

[issue3932] HTMLParser cannot handle '&' and non-ascii characters in attribute names

2009-07-30 Thread Artur Frysiak
Changes by Artur Frysiak : -- nosy: +wiget ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.or

[issue3932] HTMLParser cannot handle '&' and non-ascii characters in attribute names

2009-07-30 Thread Zbigniew Chyla
Zbigniew Chyla added the comment: Since `HTMLParser.unescape` in 2.5 returns `str` for `str` input, 2.6 should remain compatible. Therefore I propose the attached patch (`HTMLParser-unescape-fix.diff`). With this patch applied the result will have the same type as the input. -- keywords

[issue3932] HTMLParser cannot handle '&' and non-ascii characters in attribute names

2008-10-03 Thread Simon Cross
Simon Cross <[EMAIL PROTECTED]> added the comment: I've tracked down the cause to the .unescape(...) method in HTMLParser. The replaceEntities function passed to re.sub() always returns a unicode character, even when matching string s is a byte string. Changing line 383 to: return self.entityd

[issue3932] HTMLParser cannot handle '&' and non-ascii characters in attribute names

2008-10-03 Thread yanne
yanne <[EMAIL PROTECTED]> added the comment: It seems that I managed to upload wrong test file the first time. This attached test should fail, I tested it with Python2.6 final both on Linux and Windows. Added file: http://bugs.python.org/file11690/test.py __

[issue3932] HTMLParser cannot handle '&' and non-ascii characters in attribute names

2008-10-03 Thread yanne
Changes by yanne <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file11557/test.py ___ Python tracker <[EMAIL PROTECTED]> ___ ___ Pyth

[issue3932] HTMLParser cannot handle '&' and non-ascii characters in attribute names

2008-09-26 Thread Simon Cross
Simon Cross <[EMAIL PROTECTED]> added the comment: I can't reproduce this on current trunk (r66633, 27 Sep 2008). I checked sys.getdefaultencoding() but that returned 'ascii' as expected and I even tried language Python with "LANG=C ./python" but that didn't fail either. Perhaps this has been fix

[issue3932] HTMLParser cannot handle '&' and non-ascii characters in attribute names

2008-09-22 Thread yanne
New submission from yanne <[EMAIL PROTECTED]>: It seems that HTMLParser.feed throws an exception whenever an attribute name contains both quotation mark '&' and non-ascii characters. Running the attached test file with Python 2.5 succeeds, but with Python 2.6, the result is: C:\Python26>python.