On Thu, 2012-10-18 at 19:25 -0700, Zhigang Chen wrote: > Thanks Liam > > We are building a platform to which codes containing xpaths are > submitted by external users. Manual optimization of xpaths are > infeasible. Do you know about any tools that can automate it?
Setting aside the security considerations, such as XPath's ability to access external files... if the XML data is constant, I'd use XQuery with an index - in C or C++ you could use dbxml or sedna, or if Java is OK basex or qizx, or spring for a commercial XQuery implementation. The difference in throughput is partly because the indexed implementation doesn't need to process the XML each time, and partly because the class of optimizations that can be done is larger. XQuery is a superset of XPath. Since I seem to say this quite often on this list I should say, I don't have anything against libxml, it's awesome work and I use it myself, too. But I use other tools as well, when I think they make more sense. So take a look at the whole picture. If you stay with libxml you could maybe work on the optimizer. As Daniel mentioned, there was recently a patch that helped with performance. It might also be possible to store the parsed data structures, or to write an "xpath server" that reads xpath queries and runs them without reloading documents. But libxml's optimizer doesn't build indexes to the document, so there will still be some limits on performance. If you have a new XML document with each XPath expression, XQuery engines might be less of a help, although if the document is over a megabyte or so (say), XQuery implementation that build an index on the fly as the document is read will win out with some sorts of query and maybe lose with others (because of the extra work in building the index). It's the same with SQL - it's possible to write queries that take hours to run on even a small database, and you can do the same with XPath. So the timeout approach is probably part of a solution in any case. Liam -- Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/ Pictures from old books: http://fromoldbooks.org/ Ankh: irc.sorcery.net irc.gnome.org freenode/#xml _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml