Re: How to speed up field collapsing on large number of groups

Alessandro Benedetti Mon, 04 Jul 2016 06:21:08 -0700

Have you tried with docValues for the fields involved in the collapse group
head selection ?


With a group head selection of "min" "max"and "sort" should work quite well.
Of course it depends of your formula.

Does your index change often ?
If the warming time is not a problem you could try with :

hint

Currently there is only one hint available "top_fc", which stands for top
level FieldCache. The top_fc hint is only available when collapsing on
String fields. top_fc provides the best query time speed but takes the
longest to warm on startup or following a commit. top_fc also will result
in having the collapsed field cached in memory twice if the it's used for
faceting or sorting.

Cheers

On Wed, Jun 29, 2016 at 1:59 AM, Jichi Guo <jichi...@gmail.com> wrote:

> Thanks for the quick response, Joel!
>
> I am hoping to delay sharding if possible, which might involve more things
> to
> consider :)
>
>
>
> 1) What is the size of the result set before the collapse?
>
>
>
> When search with q=*:* for example, before collapse numFound is around 5
> million, and that after collapse is 2 million.
>
> I only return about the top 30 documents in the result.
>
>
>
> 2) Have you tested without the long formula, just using a field for the
> min/max. It would be good to understand the impact of the formula on
> performance.
>
>
>
> The performance seems to be affected by the number of fields appearing in
> the
> max formula.
>
>
>
> For example, that 5 million expensive query would take 4.4 sec.
>
> For both {!collapse field=productGroupId} and {!collapse
> field=productGroupId
> max=only_one_field}, the query time would reduce to around 2.4 sec.
>
> If I remove the entire collapse fq, then the query only took 1.3 sec.
>
>
>
> 3) How much memory do you have on the server and for the heap. Memory use
> rises with the cardinality of the collapse field. So you'll want to be sure
> there is enough memory to comfortably perform the collapse.
>
>
>
> I am setting Xmx to 24G. The total index size on disk is 50G.
>
> In solrconfig.xml, I use solr.FastLRUCache for filterCache with cache size
> 2048, solr.LRUCache for documentCache with cache size 32768, and
> solr.LRUCache
> for queryResultCache with cache size 4096. I am using default
> fieldValueCache.
>
>
>
> I found Collapsing QParser plugin explicitly uses lucene's field cache.
>
> Maybe, increasing fieldCache would help?  But I am not sure how to
> increase it
> in Solr.
>
>
> Sent from [Nylas N1](https://link.nylas.com/link/5tkvmhpozan5j5h3lhni487b
> /local-
>
> 481233c4-d727/0?redirect=https%3A%2F%2Fnylas.com%2Fn1%3Fref%3Dn1&r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn),
> the extensible, open source mail client.
>
> ![](https://link.nylas.com/open/5tkvmhpozan5j5h3lhni487b/local-
> 481233c4-d727?r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)
>
> On Jun 28 2016, at 4:48 pm, Joel Bernstein &lt;joels...@gmail.com&gt;
> wrote:
>
> > Sharding will help, but you'll need to co-locate documents by group ID. A
> few questions / suggestions:
>
> >
>
> >
> >
>
> >
>
> > 1) What is the size of the result set before the collapse?
>
> >
>
> > 2) Have you tested without the long formula, just using a field for the
> min/max. It would be good to understand the impact of the formula on
> performance.
>
> >
>
> > 3) How much memory do you have on the server and for the heap. Memory use
> rises with the cardinality of the collapse field. So you'll want to be sure
> there is enough memory to comfortably perform the collapse.
>
> >
>
> >
> >
>
> >
>
> >
> >
>
> >
>
> >
> >
>
> >
>
> > Joel Bernstein
>
> >
>
> >
> [
> http://joelsolr.blogspot.com/](http://joelsolr.blogspot.com/&r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)
> >
>
> >
>
> >
> >
>
> >
>
> > On Tue, Jun 28, 2016 at 4:08 PM, jichi
> &lt;[jichi...@gmail.com](mailto:jichi...@gmail.com)&gt; wrote:
> >
>
> >
>
> >> Hi everyone,
> >
> >  I am using Solr 4.10 to index 20 million documents without sharding.
> >  Each document has a groupId field, and there are about 2 million groups.
> >  I found the search with collapsing on groupId significantly slower
> >  comparing to without collapsing, especially when combined with facet
> >  queries.
> >
> >  I am wondering what would be the general approach to speedup field
> >  collapsing by 2~4 times?
> >  Would sharding the index help?
> >  Is it possible to optimize collapsing without sharding?
> >
> >  The filter parameter for collapsing is like this:
> >
> >      q=*:*&amp;fq={!collapse field=groupId max=sum(...a long formula...)}
> >
> >  I also put this fq into warmup queries xml to warmup caches. But still,
> >  when q changes and more fq are added, the collapsing search would take
> >  about 3~5 seconds. Without collapsing, the search can finish within 2
> >  seconds.
> >
> >  I am thinking to manually optimize CollapsingQParserPlugin through
> >  parallelization or extra caching.
> >  For example, is it possible to parallelize collapsing collector by
> >  different lucene index segments?
> >
> >  Thanks!
> >
> >  \--
> >  jichi
> >
>
> >
>
> >
> >
>
>


-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: How to speed up field collapsing on large number of groups

Reply via email to