Nick Coghlan added the comment:
The event generation API for ElementTree being discussed in issue 17741 is
potentially relevant here.
I think that style of API is preferable, as it doesn't alter how data is fed
into the parser, just how it is extracted.
--
nosy: +ncoghlan
Changes by Ezio Melotti ezio.melo...@gmail.com:
--
nosy: +scoder
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17410
___
___
Python-bugs-list
New submission from flying sheep:
hi, i have an idea on how to make an internal change to html.parser.HTMLParser,
which would expose a token generator interface.
after that, we would be able to do e.g. list(HTMLParser().tokenize(data)) or
even
parser = HTMLParser()
for chunk in
Changes by flying sheep flying-sh...@web.de:
--
components: +XML
type: - enhancement
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17410
___
___
Ezio Melotti added the comment:
If you have a patch you can post it, however new features are allowed only in
Python 3.4, and they must be backward compatible (run python -m test
test_htmlparser to check that).
--
components: +Library (Lib) -XML
nosy: +ezio.melotti
versions: +Python
R. David Murray added the comment:
I think that in order to maintain backward compatibility the existing parse_
names should continue to have the same signature, but they could be
re-implemented in terms of new versions that return the token. That way if an
application overrides the methods
karl added the comment:
flying sheep: do you plan to make it easier to use the HTML5 algorithm?
http://www.w3.org/TR/html5/syntax.html#parsing
--
nosy: +karlcow
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17410
Ezio Melotti added the comment:
HTMLParser already parsers HTML5 producing the correct result in most of the
cases.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17410
___
karl added the comment:
Ezio: I'm talking about HTML5 Parsing algorithm, not about about parsing
html* documents. :)
The only python parser I know who is closer of the HTML5 parser algorithm is
https://code.google.com/p/html5lib/
--
___
Python
Ezio Melotti added the comment:
Well, I'm not sure what's the point of implementing that specific algorithm if
the end result is the same. HTMLParser implementation also has the advantage
of being much simpler, and probably faster too. If for some reason you want
that specific algorithm you
flying sheep added the comment:
no, i didn’t change anything that didn’t have to be changed to expose the
tokens. i kept the changes as minimal as possible.
and the tests pass! i attached the patch.
---
aside thoughts:
i had to change _markupbase.py, too, but i wonder why it’s even a
flying sheep added the comment:
whoops, left my editor modeline in. i knew that was going to happen.
--
Added file: http://bugs.python.org/file29402/htmltokenizer.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17410
Changes by flying sheep flying-sh...@web.de:
Removed file: http://bugs.python.org/file29401/htmltokenizer.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17410
___
13 matches
Mail list logo