Hello, I just joined this mailinglist and this is my first post.
We are having some performance issues and believe some of them can be traced into Jackrabbit's org.apache.jackrabbit.core.query.lucene.QueryResultImpl class. We have updated to Jackrabbit 2.10.3 to be able to enable the 'sizeEstimate' option [0] and got some performance improvement out of that, but we still have an issue with queries with large offset values. An offset of 12000 causes QueryResultImpl to build an offsetNodes list with 12000 entries, which when using sizeEstimate is immediately discarded afterwards. We'd love to see the performance difference with an implementation that just does skip(offset) before getting the resultNodes from the query hits. Would it make sense to have that as the default implementation, at least with sizeEstimate enabled? Should I create a JIRA issue for this? I have created a gist [1] which demonstrates this issue. You'll need to set a breakpoint though, since the issue is in internal state of the QueryResultImpl class. I'd like to modify the QueryResultImpl class to see if there is indeed a big performance gain to be had there for us. It seems the use of QueryResultImpl is buried pretty deeply though. QueryResultImpl is an abstract class with two concrete implementations: org.apache.jackrabbit.core.query.lucene.SingleColumnQueryResult and org.apache.jackrabbit.core.query.lucene.MultiColumnQueryResult, of which only the first seems to be used and it is explicitly created by org.apache.jackrabbit.core.query.lucene.QueryImpl#execute. So, the use of SingleColumnQueryResult, and therefor also its parent class QueryResultImpl, is hardcoded in org.apache.jackrabbit.core.query.lucene.QueryImpl. Instances of org.apache.jackrabbit.core.query.lucene.QueryImpl (which implement ExecutableQuery) are created by SearchIndex#createExecutableQuery (SearchIndex is the only implementation of the QueryHandler interface shipping in Jackrabbit), which is a member of the SearchManager class. The SearchManager constructor gets its handler by calling QueryHandlerFactory#getQueryHandler. The two implementations of QueryHandlerFactory are WorkspaceConfig and RepositoryConfig, both of which have a QueryHandlerFactory as a member...?! Our workspace and repository XML files currently have org.apache.jackrabbit.core.query.lucene.SearchIndex configured as the SearchIndex, with org.apache.jackrabbit.core.query.QueryImpl as the queryClass. So, we'd have to change the SearchIndex in the configuration to a class which doesn't create instances of org.apache.jackrabbit.core.query.lucene.QueryImpl, because those create instances of SingleColumnQueryResult, which are QueryResultImpl implementations. That's a lot of classes to redo for just this one change in QueryResultImpl. Is our best bet to either put a patched QueryResultImpl on the classpath or make a custom Jackrabbit build if we'd quickly like to evaluate the performance difference for our setup? Thanks, Nils. [0] https://issues.apache.org/jira/browse/JCR-3858 [1] https://gist.github.com/breun/7d2072b3b6ae8c2a66e3057a603ebcdc
