I updated the XMLParser to be reused by each NodePushable/ScalarEvaluator instance. The query time did not change much, although the heap memory usage is more steady and does not grow significantly while processing the query.
Yes, a few remaining functions can be removed. On Tue, Feb 11, 2014 at 11:51 PM, Till Westmann <[email protected]> wrote: > I agree that we should re-use the XML parser. While we generally are > careful to keep per-tuple memory allocation minimal, we are generously > allocation XMLReaders and SAXContentHandlers. I'm not sure that this will > account for the difference (do you have more details on the time spent > during parsing?), but this would certainly be a better approach. > To make sure that we're not stepping on each others feet, we should > however have one XMLParser object for each NodePushable/ScalarEvaluator > instance. > > Wrt. the cost of the select expression I guess that we still have a number > of functions in there that are not strictly necessary. Is that right? > > In any case I think the we should now try to focus on parallelization and > parallel performance and not necessarily on single thread performance. > > Does this make sense? > > Cheers, > Till > > On Feb 11, 2014, at 8:46 AM, Eldon Carman <[email protected]> wrote: > > > The compiling and parsing for both Saxon and VXQuery consume a large > amount > > of the query time. Saxon definitely has improved their parsing efficiency > > and later query processing. Take a look at these numbers: > > > > VXQuery compile time 700 to 1600ms (<1% of total query time) > > Saxon compile time 230 to 260ms (<3% of total query time) > > > > Using a profiler... > > VXQuery parsing time 335,000ms (43% of total query time) > > Saxon parsing time 17,500ms (88% of total query time) > > > > VXQuery remaining time (56% of total query time) > > Saxon remaining time (11% of total query time) > > > > Notice the huge difference in time dedicated to parsing for VXQuery. Also > > not the time not outlined as the rest of the query time. Most of that > time > > for VXQuery is in the select expression (50% of total query time) while > > saxon has really no noticeable time spent on evaluating the select > > expression. > > > > > > Seeing the difference in parsing, I found these two articles about > > improving the XML Parser: > > http://www.ibm.com/developerworks/xml/library/x-perfap1/index.html > > http://www.ibm.com/developerworks/library/x-perfap2/ > > > > I think the section on reusing the parser would be a big help for us. > >
