Re: Query that sorts a large result set.

Marcel Reutegger Wed, 17 Jun 2009 01:14:22 -0700

Hi,

the sorting is pretty well optimized, it basically uses underlying
lucene functionality for that. there are two other important points
that will influence performance:

1) workspace configuration

the default workspace configuration will cause initial fetching of the
entire result set. you can change this behavior by setting the
resultFetchSize parameter. See [0].

2) Ian wrote: "I only want to see a small number of items eg 100 after
a particular date."

that might actually become a problem. it will result in a range query
that potentially selects lots (millions?) of nodes with distinct date
properties. this case is not optimized. there's a new indexing
technique in lucene called trierange queries [1] which was
specifically built to perform such queries efficiently. but this is
not yet integrated with jackrabbit.

I've created a JIRA issue to discuss and keep track of such an
enhancement in jackrabbit: [2]

regards
 marcel

[0] http://issues.apache.org/jira/browse/JCR-651
[1] 
http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/
[2] https://issues.apache.org/jira/browse/JCR-2151

On Wed, Jun 17, 2009 at 01:50, Ian Boston<[email protected]> wrote:
> Hi,
>
> I want to perform a query where the full result set could be millions of
> items. That set needs to be sorted by the lastModified attribute on the
> node, and I only want to see a small number of items eg 100 after a
> particular date.
>
> If I do this, will there be scalability issues, or is the sorting of a date
> field optimized in the query engine ?
>
> Thanks
> Ian
>

Re: Query that sorts a large result set.

Reply via email to