Thanks for this. I've studied some algorhitms and depth-first search
looks as it's what I want.

2010/6/25 Stefan Behnel <[email protected]>:
> Michael Ludwig, 23.06.2010 23:29:
>>
>> Oliver Kindernay schrieb am 23.06.2010 um 18:39 (+0200):
>>>
>>> I am using libxml2 HTML 4.0 parser to parse HTML and XHTML web pages.
>>> I want to found specific tags (i.e a), so I have to walk through the
>>> tree of parsed document. And I don't want to use recursion like in
>>> this example http://xmlsoft.org/examples/tree1.c. Is there some
>>> mechanism in libxml which provides parsed nodes in some queue?
>>
>> Sounds like you should be using a high-level approach such as XPath
>> or XSLT. Forgoing the benefits provided by these technologies is like
>> deliberately using flintstone to make fire.
>
> Not necessarily. lxml.etree (Pythonic Python bindings for libxml2) has a
> pair of macros for an iterative tree traversal loop. When I introduced it,
> it gave me a 30% speed-up compared to my original recursive traversal code,
> and it was almost 10% faster than plain XPath at the time. See the
> bench_lxml_xpath() and bench_lxml_getiterator() functions here:
>
> http://codespeak.net/lxml/performance.html#a-longer-example
>
> The code is near the end of this file (look for a long comment starting with
> "depth first tree walker"):
>
> http://codespeak.net/svn/lxml/trunk/src/lxml/etree_defs.h
>
> These macros are the main reason why tree iteration is so blazingly fast in
> lxml.etree. Just look at these numbers:
>
> http://codespeak.net/lxml/performance.html#tree-traversal
>
> When searching for a specific tag (and when XML-ID is not an option), a well
> forged loop can be a lot faster than a generic XPath implementation.
>
> Stefan
> _______________________________________________
> xml mailing list, project page  http://xmlsoft.org/
> [email protected]
> http://mail.gnome.org/mailman/listinfo/xml
>
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to