On Fri, Jan 30, 2015 at 10:03 AM, cfalletta <[email protected]> wrote:
> Hello Thomas,
>
> Thanks for your answer.
>
> I'm using version 2.6.5 of jackrabbit.
>
> We're loading 300.000+ documents in production and it takes 3-5 minutes to
> load it all. 2 queries are run : the select * with a limit, and the select *
> without limit. I'll attach the source file  source_jackrabbit.txt
> <http://jackrabbit.510166.n4.nabble.com/file/n4661929/source_jackrabbit.txt>
>
> In the development environment, i set the logging of jackrabbit to debug,
> and it appeared that the first query was taking a lot of time. However,
> setting the logging level to DEBUG seriously decreased the overall
> performance. I'll run another test without count and without debug mode on a
> large set of documents to be sure, thanks for the advice.
>
> By the way, i've heard of another implementation of QueryResult that would
> return the totalSize of the query without "limit" :
> org.apache.jackrabbit.core.query.lucene.QueryResultImpl. But
> org.apache.jackrabbit.core.query.lucene.QueryResult only works with
> SingleColumnQueryResult.
> -> Any idea how to use QueryResultImpl and if it is a viable solution ?
>
> Is jackrabbit able to properly handle queries on millions of documents as
> long as we have a limit in the query ?

In general, yes.

A bit more detailed: the problem is not really the query itself (most
of the time), but the authorization of the results. If you set a
limit, say of 100, then the authorization part can stop after the
query granted read access to 100 nodes. A limit will still result in
bad performance if your use has only read access to, say, 0.1%,
because then on average, for 100 granted results, there must be
100.000 nodes checked. Again, the performance also depends on your
bundle caches: If all nodes are in memory, checking 100.000 nodes
won't be blistering fast, but not really slow either. If you run
through your caches, then, when nodes have to be fetched from a
backing database, performance will drop insanely.

Please realize, that if you want to compare jackrabbit searches with
something like Solr or Elastic Search, a fair comparison would be to
check every result from Solr or Elastic Search separately for read
access against some external system for example. It is for a reason
that Solr or ES hardly do anything for fine (fine!!)grained ACL kind
of indexing...that is a really complex part

Hope this helps

Last thing: Some queries, mainly queries with hierarchical constraints
do not perform well for millions of nodes. Again, something that is
hard to achieve with Lucene

Regards Ard

>
> Kind Regards,
> Cédric
>
>
>
> --
> View this message in context: 
> http://jackrabbit.510166.n4.nabble.com/really-poor-search-performance-tp4661920p4661929.html
> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.



-- 
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

Reply via email to