[issue14538] HTMLParser: parsing error

2012-04-09 Thread Michel Leunen
Changes by Michel Leunen : -- title: HTMLParser -> HTMLParser: parsing error ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue14538] HTMLParser: parsing error

2012-04-09 Thread Jim Jewett
Jim Jewett added the comment: What do you think it should do? My thought is that meta tags may or may not be void, but certainly should not be nested. As XML, I would parse that as *missing closing tag But for html, there is more cleanup. The catch is that this module probabl

[issue14538] HTMLParser: parsing error

2012-04-09 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- nosy: +ezio.melotti ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.

[issue14538] HTMLParser: parsing error

2012-04-09 Thread Ezio Melotti
Ezio Melotti added the comment: With Python 2.7.3rc2 and 3.3.0a0 (strict=False) I get: Start tag: a End tag : a Start tag: script End tag : script Start tag: meta Data : Start tag: body End tag : body This is better, but still not 100% correct, the "" shouldn't be seen as data. -

[issue14538] HTMLParser: parsing error

2012-04-12 Thread Georg Brandl
Georg Brandl added the comment: ISTM that "" is neither valid HTML nor valid XHTML. -- nosy: +georg.brandl ___ Python tracker ___ ___

[issue14538] HTMLParser: parsing error

2012-04-12 Thread Ezio Melotti
Ezio Melotti added the comment: Here's a patch. -- keywords: +patch stage: test needed -> patch review Added file: http://bugs.python.org/file25188/issue14538.diff ___ Python tracker __

[issue14538] HTMLParser: parsing error

2012-04-12 Thread Jim Jewett
Jim Jewett added the comment: -1 on that particular patch. (with only whitespace between "/" and ">") strikes me as obviously intending to close the tag, and a reasonably common error. I can't think of any reason to support nested meta tags while not supporting sloppy self-closing tags.

[issue14538] HTMLParser: parsing error

2012-04-12 Thread Jim Jewett
Jim Jewett added the comment: This issue is also marked for (bugfix-only) 2.7 and 3.2. Unless there is a specification somewhere (or at least an editor's draft), I can't really see any particular parse as a bugfix. Was the goal just to make the parse finish, as opposed to stopping part way

[issue14538] HTMLParser: parsing error

2012-04-12 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: To be consistent, this patch should remove the references to http://www.w3.org/TR/html5/tokenization.html#tag-open-state and http://www.w3.org/TR/html5/tokenization.html#tag-open-state as irrelevant. -- nosy: +storchaka

[issue14538] HTMLParser: parsing error

2012-04-12 Thread R. David Murray
R. David Murray added the comment: Yes, after considerable discussion those of working on this stuff decided that the goal should be that the parser be able to complete parsing, without error, anything the typical browsers can parse (which means, pretty much anything, though that says nothing

[issue14538] HTMLParser: parsing error

2012-04-12 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I apologize for misplaced sarcasm. After more careful reading of the source code, I found out that the patch really meets the specifications and the behavior of all tested me browsers. Despite its awkward appearance, the patch fixes a flaw of the original c

[issue14538] HTMLParser: parsing error

2012-04-12 Thread Ezio Melotti
Ezio Melotti added the comment: Attached a new patch with a few more tests and a simplification of the attrfind regex. > But it allows the passage of an invalid code in strict mode. HTMLParser is following the HTML5 specs, and doesn't do validation, so there's no strict mode (and the strict

[issue14538] HTMLParser: parsing error

2012-04-12 Thread Jim Jewett
Jim Jewett added the comment: On Thu, Apr 12, 2012 at 7:26 PM, Ezio Melotti wrote: > If HTMLParser doesn't parse as the HTML5 specs say, > then it's considered a bug. I had thought that was the goal of the html5lib, and that HTMLParser was explicitly aiming at a much reduced model in order to r

[issue14538] HTMLParser: parsing error

2012-04-12 Thread Ezio Melotti
Ezio Melotti added the comment: HTMLParser is still simpler than html5lib, but if/when possible we are following the HTML5 standard rather than taking arbitrary decisions (like we used to do before HTML5). HTMLParser doesn't claim to be a fully compliant HTML5 parser (and probably never will

[issue14538] HTMLParser: parsing error

2012-04-13 Thread Jim Jewett
Jim Jewett added the comment: It sounds like this is a case where the docs should mention an external library; perhaps something like changing the intro of http://docs.python.org/dev/library/html.parser.html from: """ 19.2. html.parser — Simple HTML and XHTML parser Source code: Lib/html/pars

[issue14538] HTMLParser: parsing error

2012-04-13 Thread Éric Araujo
Éric Araujo added the comment: Sure, the docs should explain better that html.parser tries its best to parse stuff, is not a validating parser, and is actively developed, contrary to the popular belief that standard library modules never get improved. I’m less sure about links, there is more

[issue14538] HTMLParser: parsing error

2012-04-18 Thread Roundup Robot
Roundup Robot added the comment: New changeset 36c901fcfcda by Ezio Melotti in branch '2.7': #14538: HTMLParser can now parse correctly start tags that contain a bare /. http://hg.python.org/cpython/rev/36c901fcfcda New changeset ba4baaddac8d by Ezio Melotti in branch '3.2': #14538: HTMLParser

[issue14538] HTMLParser: parsing error

2012-04-18 Thread Ezio Melotti
Ezio Melotti added the comment: This is now fixed, thanks for the report! Regarding the documentation feel free to open another issue, but at some point I'll probably update it anyway and/or write down somewhere what the future plans for HTMLParser and its goals. -- resolution: -> f

[issue14538] HTMLParser: parsing error

2012-04-19 Thread Michel Leunen
Michel Leunen added the comment: Thanks guys for your comments and for solving this issue. Great work! -- ___ Python tracker ___ ___