Re: Performance issues with QueryResultImpl with larger offset values

Ard Schrijvers Fri, 11 Nov 2016 04:22:08 -0800

Hey,

On Fri, Nov 4, 2016 at 5:50 PM, Nils Breunese <[email protected]> wrote:
> Hello,
>
> I just joined this mailinglist and this is my first post.
>
> We are having some performance issues and believe some of them can be traced 
> into Jackrabbit's org.apache.jackrabbit.core.query.lucene.QueryResultImpl 
> class. We have updated to Jackrabbit 2.10.3 to be able to enable the 
> 'sizeEstimate' option [0] and got some performance improvement out of that, 
> but we still have an issue with queries with large offset values. An offset 
> of 12000 causes QueryResultImpl to build an offsetNodes list with 12000 
> entries, which when using sizeEstimate is immediately discarded afterwards. 
> We'd love to see the performance difference with an implementation that just 
> does skip(offset) before getting the resultNodes from the query hits.


I've already replied in the jira issue as well, but the line above is
exactly where the reasoning fails: It namely would bypass
authorization completely (a search hit from Lucene does not mean the
current jcr session has read access to the node)

HTH,

Regards Ard


> Would it make sense to have that as the default implementation, at least with 
> sizeEstimate enabled? Should I create a JIRA issue for this?
>
> I have created a gist [1] which demonstrates this issue. You'll need to set a 
> breakpoint though, since the issue is in internal state of the 
> QueryResultImpl class.
>
> I'd like to modify the QueryResultImpl class to see if there is indeed a big 
> performance gain to be had there for us. It seems the use of QueryResultImpl 
> is buried pretty deeply though. QueryResultImpl is an abstract class with two 
> concrete implementations: 
> org.apache.jackrabbit.core.query.lucene.SingleColumnQueryResult and 
> org.apache.jackrabbit.core.query.lucene.MultiColumnQueryResult, of which only 
> the first seems to be used and it is explicitly created by 
> org.apache.jackrabbit.core.query.lucene.QueryImpl#execute. So, the use of 
> SingleColumnQueryResult, and therefor also its parent class QueryResultImpl, 
> is hardcoded in org.apache.jackrabbit.core.query.lucene.QueryImpl.
>
> Instances of org.apache.jackrabbit.core.query.lucene.QueryImpl (which 
> implement ExecutableQuery) are created by SearchIndex#createExecutableQuery 
> (SearchIndex is the only implementation of the QueryHandler interface 
> shipping in Jackrabbit), which is a member of the SearchManager class. The 
> SearchManager constructor gets its handler by calling 
> QueryHandlerFactory#getQueryHandler. The two implementations of 
> QueryHandlerFactory are WorkspaceConfig and RepositoryConfig, both of which 
> have a QueryHandlerFactory as a member...?!
>
> Our workspace and repository XML files currently have 
> org.apache.jackrabbit.core.query.lucene.SearchIndex configured as the 
> SearchIndex, with org.apache.jackrabbit.core.query.QueryImpl as the 
> queryClass. So, we'd have to change the SearchIndex in the configuration to a 
> class which doesn't create instances of 
> org.apache.jackrabbit.core.query.lucene.QueryImpl, because those create 
> instances of SingleColumnQueryResult, which are QueryResultImpl 
> implementations. That's a lot of classes to redo for just this one change in 
> QueryResultImpl.
>
> Is our best bet to either put a patched QueryResultImpl on the classpath or 
> make a custom Jackrabbit build if we'd quickly like to evaluate the 
> performance difference for our setup?
>
> Thanks, Nils.
>
> [0] https://issues.apache.org/jira/browse/JCR-3858
> [1] https://gist.github.com/breun/7d2072b3b6ae8c2a66e3057a603ebcdc



-- 
Hippo Netherlands, Oosteinde 11, 1017 WT Amsterdam, Netherlands
Hippo USA, Inc. 71 Summer Street, 2nd Floor Boston, MA 02110, United
states of America.

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

Re: Performance issues with QueryResultImpl with larger offset values

Reply via email to