Re: [xml] confusing xpath performance characteristics

Stefan Behnel Mon, 09 Nov 2009 10:24:12 -0800

Stefan Behnel, 09.11.2009 09:57:
> It's the last operation, merging and sorting large sets of results, that
> makes this extremely slow - it takes 92% of the evaluation time in my tests
> (using libxml2 2.7.5). It's much faster to traverse the document in a
> single step, and just select single attributes from it, that can quickly be
> appended to the node set.
> 
> I imagine that this step could actually be optimised away in many cases
> (like the case above, where results are guaranteed to be found in doc
> order), so I guess it's just in there to avoid too much special casing. But
> it seriously kills the performance here.


Would it be ok to add a new "int isInDocumentOrder" field to the xmlNodeSet
struct that would be true for node sets that are known to be in document order?

That would make it easy to skip the sorting step in all cases where
building the node set follows document order anyway. Given that node
comparison is horribly expensive (more than 90% of the sort time in my
tests), I think it's absolutely worth avoiding the sorting step whenever
possible.

Also, for sorted node sets, xmlXPathNodeSetAdd() could compare the new node
to the last node in the node-set and clear the flag if the new node breaks
the document order. That way, only N-1 comparisons would be required for a
sorted set of N nodes, instead of something like O(N^2) currently.

Stefan
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] confusing xpath performance characteristics

Reply via email to