Hey, On Fri, Nov 4, 2016 at 5:50 PM, Nils Breunese <[email protected]> wrote: > Hello, > > I just joined this mailinglist and this is my first post. > > We are having some performance issues and believe some of them can be traced > into Jackrabbit's org.apache.jackrabbit.core.query.lucene.QueryResultImpl > class. We have updated to Jackrabbit 2.10.3 to be able to enable the > 'sizeEstimate' option [0] and got some performance improvement out of that, > but we still have an issue with queries with large offset values. An offset > of 12000 causes QueryResultImpl to build an offsetNodes list with 12000 > entries, which when using sizeEstimate is immediately discarded afterwards. > We'd love to see the performance difference with an implementation that just > does skip(offset) before getting the resultNodes from the query hits.
I've already replied in the jira issue as well, but the line above is exactly where the reasoning fails: It namely would bypass authorization completely (a search hit from Lucene does not mean the current jcr session has read access to the node) HTH, Regards Ard > Would it make sense to have that as the default implementation, at least with > sizeEstimate enabled? Should I create a JIRA issue for this? > > I have created a gist [1] which demonstrates this issue. You'll need to set a > breakpoint though, since the issue is in internal state of the > QueryResultImpl class. > > I'd like to modify the QueryResultImpl class to see if there is indeed a big > performance gain to be had there for us. It seems the use of QueryResultImpl > is buried pretty deeply though. QueryResultImpl is an abstract class with two > concrete implementations: > org.apache.jackrabbit.core.query.lucene.SingleColumnQueryResult and > org.apache.jackrabbit.core.query.lucene.MultiColumnQueryResult, of which only > the first seems to be used and it is explicitly created by > org.apache.jackrabbit.core.query.lucene.QueryImpl#execute. So, the use of > SingleColumnQueryResult, and therefor also its parent class QueryResultImpl, > is hardcoded in org.apache.jackrabbit.core.query.lucene.QueryImpl. > > Instances of org.apache.jackrabbit.core.query.lucene.QueryImpl (which > implement ExecutableQuery) are created by SearchIndex#createExecutableQuery > (SearchIndex is the only implementation of the QueryHandler interface > shipping in Jackrabbit), which is a member of the SearchManager class. The > SearchManager constructor gets its handler by calling > QueryHandlerFactory#getQueryHandler. The two implementations of > QueryHandlerFactory are WorkspaceConfig and RepositoryConfig, both of which > have a QueryHandlerFactory as a member...?! > > Our workspace and repository XML files currently have > org.apache.jackrabbit.core.query.lucene.SearchIndex configured as the > SearchIndex, with org.apache.jackrabbit.core.query.QueryImpl as the > queryClass. So, we'd have to change the SearchIndex in the configuration to a > class which doesn't create instances of > org.apache.jackrabbit.core.query.lucene.QueryImpl, because those create > instances of SingleColumnQueryResult, which are QueryResultImpl > implementations. That's a lot of classes to redo for just this one change in > QueryResultImpl. > > Is our best bet to either put a patched QueryResultImpl on the classpath or > make a custom Jackrabbit build if we'd quickly like to evaluate the > performance difference for our setup? > > Thanks, Nils. > > [0] https://issues.apache.org/jira/browse/JCR-3858 > [1] https://gist.github.com/breun/7d2072b3b6ae8c2a66e3057a603ebcdc -- Hippo Netherlands, Oosteinde 11, 1017 WT Amsterdam, Netherlands Hippo USA, Inc. 71 Summer Street, 2nd Floor Boston, MA 02110, United states of America. US +1 877 414 4776 (toll free) Europe +31(0)20 522 4466 www.onehippo.com
