Performance issues with QueryResultImpl with larger offset values

Nils Breunese Fri, 04 Nov 2016 09:51:38 -0700

Hello,

I just joined this mailinglist and this is my first post.


We are having some performance issues and believe some of them can be traced 
into Jackrabbit's org.apache.jackrabbit.core.query.lucene.QueryResultImpl 
class. We have updated to Jackrabbit 2.10.3 to be able to enable the 
'sizeEstimate' option [0] and got some performance improvement out of that, but 
we still have an issue with queries with large offset values. An offset of 
12000 causes QueryResultImpl to build an offsetNodes list with 12000 entries, 
which when using sizeEstimate is immediately discarded afterwards. We'd love to 
see the performance difference with an implementation that just does 
skip(offset) before getting the resultNodes from the query hits. Would it make 
sense to have that as the default implementation, at least with sizeEstimate 
enabled? Should I create a JIRA issue for this?

I have created a gist [1] which demonstrates this issue. You'll need to set a 
breakpoint though, since the issue is in internal state of the QueryResultImpl 
class.

I'd like to modify the QueryResultImpl class to see if there is indeed a big 
performance gain to be had there for us. It seems the use of QueryResultImpl is 
buried pretty deeply though. QueryResultImpl is an abstract class with two 
concrete implementations: 
org.apache.jackrabbit.core.query.lucene.SingleColumnQueryResult and 
org.apache.jackrabbit.core.query.lucene.MultiColumnQueryResult, of which only 
the first seems to be used and it is explicitly created by 
org.apache.jackrabbit.core.query.lucene.QueryImpl#execute. So, the use of 
SingleColumnQueryResult, and therefor also its parent class QueryResultImpl, is 
hardcoded in org.apache.jackrabbit.core.query.lucene.QueryImpl.

Instances of org.apache.jackrabbit.core.query.lucene.QueryImpl (which implement 
ExecutableQuery) are created by SearchIndex#createExecutableQuery (SearchIndex 
is the only implementation of the QueryHandler interface shipping in 
Jackrabbit), which is a member of the SearchManager class. The SearchManager 
constructor gets its handler by calling QueryHandlerFactory#getQueryHandler. 
The two implementations of QueryHandlerFactory are WorkspaceConfig and 
RepositoryConfig, both of which have a QueryHandlerFactory as a member...?!

Our workspace and repository XML files currently have 
org.apache.jackrabbit.core.query.lucene.SearchIndex configured as the 
SearchIndex, with org.apache.jackrabbit.core.query.QueryImpl as the queryClass. 
So, we'd have to change the SearchIndex in the configuration to a class which 
doesn't create instances of org.apache.jackrabbit.core.query.lucene.QueryImpl, 
because those create instances of SingleColumnQueryResult, which are 
QueryResultImpl implementations. That's a lot of classes to redo for just this 
one change in QueryResultImpl.

Is our best bet to either put a patched QueryResultImpl on the classpath or 
make a custom Jackrabbit build if we'd quickly like to evaluate the performance 
difference for our setup?

Thanks, Nils.

[0] https://issues.apache.org/jira/browse/JCR-3858
[1] https://gist.github.com/breun/7d2072b3b6ae8c2a66e3057a603ebcdc

Performance issues with QueryResultImpl with larger offset values

Reply via email to