[issue41748] HTMLParser: parsing error

2021-01-03 Thread karl
karl added the comment: Ezio, TL,DR: Testing in browsers and adding two tests for this issue. Should I create a PR just for the tests? https://github.com/python/cpython/blame/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Lib/test/test_htmlparser.py#L479-L485 A: comma without spaces --

[issue41748] HTMLParser: parsing error

2020-09-09 Thread Ezio Melotti
Ezio Melotti added the comment: The html.parser follows the HTML 5 specs as closely as possible. There are a few corner cases where it behaves slightly differently but it's only while dealing with invalid markup, and the differences should be trivial and generally not worth the extra comple

[issue41748] HTMLParser: parsing error

2020-09-09 Thread STINNER Victor
STINNER Victor added the comment: Also, there is no warning about security in the html.parser documentation? Is this module mature and maintained enough to be considered as reliable? Or should we warn users about possible issues on corner cases, and point to BeautilfulSoup for a more mature

[issue41748] HTMLParser: parsing error

2020-09-09 Thread STINNER Victor
STINNER Victor added the comment: HTMLParser.check_for_whole_start_tag() uses locatestarttagend_tolerant regular expression to find the end of the start tag. This regex cuts the string at the first comma (","), but not if the comma is the first character of an attribute name * '' => '' => '

[issue41748] HTMLParser: parsing error

2020-09-09 Thread Ademar Nowasky Junior
Ademar Nowasky Junior added the comment: Yes, I understand that in the same way. Both are valid attr names. Maybe it's worth noting that Javascript has no problem handling this. -- ___ Python tracker _

[issue41748] HTMLParser: parsing error

2020-09-09 Thread STINNER Victor
STINNER Victor added the comment: HTML 5.2 specification says https://www.w3.org/TR/html52/syntax.html#elements-attributes "Attribute names must consist of one or more characters other than the space characters, U+ NULL, U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), U+003E GREATER-TH

[issue41748] HTMLParser: parsing error

2020-09-09 Thread Ademar Nowasky Junior
Change by Ademar Nowasky Junior : -- type: security -> crash ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue41748] HTMLParser: parsing error

2020-09-08 Thread Ademar Nowasky Junior
Change by Ademar Nowasky Junior : -- type: crash -> security ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue41748] HTMLParser: parsing error

2020-09-08 Thread Ademar Nowasky Junior
New submission from Ademar Nowasky Junior : HTML tags that have a attribute name starting with a comma character aren't parsed and break future calls to feed(). The problem occurs when such attribute is the second one or later in the HTML tag. Doesn't seems to affect when it's the first attr