[issue17410] Generator-based HTMLParser

2013-08-25 Thread Nick Coghlan
Nick Coghlan added the comment: The event generation API for ElementTree being discussed in issue 17741 is potentially relevant here. I think that style of API is preferable, as it doesn't alter how data is fed into the parser, just how it is extracted. -- nosy: +ncoghlan

[issue17410] Generator-based HTMLParser

2013-08-24 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +scoder ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17410 ___ ___ Python-bugs-list

[issue17410] Generator-based HTMLParser

2013-03-13 Thread flying sheep
New submission from flying sheep: hi, i have an idea on how to make an internal change to html.parser.HTMLParser, which would expose a token generator interface. after that, we would be able to do e.g. list(HTMLParser().tokenize(data)) or even parser = HTMLParser() for chunk in

[issue17410] Generator-based HTMLParser

2013-03-13 Thread flying sheep
Changes by flying sheep flying-sh...@web.de: -- components: +XML type: - enhancement ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17410 ___ ___

[issue17410] Generator-based HTMLParser

2013-03-13 Thread Ezio Melotti
Ezio Melotti added the comment: If you have a patch you can post it, however new features are allowed only in Python 3.4, and they must be backward compatible (run python -m test test_htmlparser to check that). -- components: +Library (Lib) -XML nosy: +ezio.melotti versions: +Python

[issue17410] Generator-based HTMLParser

2013-03-13 Thread R. David Murray
R. David Murray added the comment: I think that in order to maintain backward compatibility the existing parse_ names should continue to have the same signature, but they could be re-implemented in terms of new versions that return the token. That way if an application overrides the methods

[issue17410] Generator-based HTMLParser

2013-03-13 Thread karl
karl added the comment: flying sheep: do you plan to make it easier to use the HTML5 algorithm? http://www.w3.org/TR/html5/syntax.html#parsing -- nosy: +karlcow ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17410

[issue17410] Generator-based HTMLParser

2013-03-13 Thread Ezio Melotti
Ezio Melotti added the comment: HTMLParser already parsers HTML5 producing the correct result in most of the cases. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17410 ___

[issue17410] Generator-based HTMLParser

2013-03-13 Thread karl
karl added the comment: Ezio: I'm talking about HTML5 Parsing algorithm, not about about parsing html* documents. :) The only python parser I know who is closer of the HTML5 parser algorithm is https://code.google.com/p/html5lib/ -- ___ Python

[issue17410] Generator-based HTMLParser

2013-03-13 Thread Ezio Melotti
Ezio Melotti added the comment: Well, I'm not sure what's the point of implementing that specific algorithm if the end result is the same. HTMLParser implementation also has the advantage of being much simpler, and probably faster too. If for some reason you want that specific algorithm you

[issue17410] Generator-based HTMLParser

2013-03-13 Thread flying sheep
flying sheep added the comment: no, i didn’t change anything that didn’t have to be changed to expose the tokens. i kept the changes as minimal as possible. and the tests pass! i attached the patch. --- aside thoughts: i had to change _markupbase.py, too, but i wonder why it’s even a

[issue17410] Generator-based HTMLParser

2013-03-13 Thread flying sheep
flying sheep added the comment: whoops, left my editor modeline in. i knew that was going to happen. -- Added file: http://bugs.python.org/file29402/htmltokenizer.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17410

[issue17410] Generator-based HTMLParser

2013-03-13 Thread flying sheep
Changes by flying sheep flying-sh...@web.de: Removed file: http://bugs.python.org/file29401/htmltokenizer.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17410 ___