Thanks for the quick response, Joel!

I am hoping to delay sharding if possible, which might involve more things to
consider :)  

  

1) What is the size of the result set before the collapse?

  

When search with q=*:* for example, before collapse numFound is around 5
million, and that after collapse is 2 million.  

I only return about the top 30 documents in the result.  

  

2) Have you tested without the long formula, just using a field for the
min/max. It would be good to understand the impact of the formula on
performance.

  

The performance seems to be affected by the number of fields appearing in the
max formula.

  

For example, that 5 million expensive query would take 4.4 sec.

For both {!collapse field=productGroupId} and {!collapse field=productGroupId
max=only_one_field}, the query time would reduce to around 2.4 sec.

If I remove the entire collapse fq, then the query only took 1.3 sec.

  

3) How much memory do you have on the server and for the heap. Memory use
rises with the cardinality of the collapse field. So you'll want to be sure
there is enough memory to comfortably perform the collapse.

  

I am setting Xmx to 24G. The total index size on disk is 50G.

In solrconfig.xml, I use solr.FastLRUCache for filterCache with cache size
2048, solr.LRUCache for documentCache with cache size 32768, and solr.LRUCache
for queryResultCache with cache size 4096. I am using default fieldValueCache.

  

I found Collapsing QParser plugin explicitly uses lucene's field cache.  

Maybe, increasing fieldCache would help?  But I am not sure how to increase it
in Solr.

  
Sent from [Nylas N1](https://link.nylas.com/link/5tkvmhpozan5j5h3lhni487b
/local-
481233c4-d727/0?redirect=https%3A%2F%2Fnylas.com%2Fn1%3Fref%3Dn1&r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn),
the extensible, open source mail client.  

![](https://link.nylas.com/open/5tkvmhpozan5j5h3lhni487b/local-
481233c4-d727?r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)

On Jun 28 2016, at 4:48 pm, Joel Bernstein <joels...@gmail.com> wrote:  

> Sharding will help, but you'll need to co-locate documents by group ID. A
few questions / suggestions:

>

>  
>

>

> 1) What is the size of the result set before the collapse?

>

> 2) Have you tested without the long formula, just using a field for the
min/max. It would be good to understand the impact of the formula on
performance.

>

> 3) How much memory do you have on the server and for the heap. Memory use
rises with the cardinality of the collapse field. So you'll want to be sure
there is enough memory to comfortably perform the collapse.

>

>  
>

>

>  
>

>

>  
>

>

> Joel Bernstein

>

>
[http://joelsolr.blogspot.com/](http://joelsolr.blogspot.com/&r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)
  
>

>

>  
>

>

> On Tue, Jun 28, 2016 at 4:08 PM, jichi
<[jichi...@gmail.com](mailto:jichi...@gmail.com)> wrote:  
>

>

>> Hi everyone,  
>  
>  I am using Solr 4.10 to index 20 million documents without sharding.  
>  Each document has a groupId field, and there are about 2 million groups.  
>  I found the search with collapsing on groupId significantly slower  
>  comparing to without collapsing, especially when combined with facet  
>  queries.  
>  
>  I am wondering what would be the general approach to speedup field  
>  collapsing by 2~4 times?  
>  Would sharding the index help?  
>  Is it possible to optimize collapsing without sharding?  
>  
>  The filter parameter for collapsing is like this:  
>  
>      q=*:*&fq={!collapse field=groupId max=sum(...a long formula...)}  
>  
>  I also put this fq into warmup queries xml to warmup caches. But still,  
>  when q changes and more fq are added, the collapsing search would take  
>  about 3~5 seconds. Without collapsing, the search can finish within 2  
>  seconds.  
>  
>  I am thinking to manually optimize CollapsingQParserPlugin through  
>  parallelization or extra caching.  
>  For example, is it possible to parallelize collapsing collector by  
>  different lucene index segments?  
>  
>  Thanks!  
>  
>  \--  
>  jichi  
>

>

>  
>

Reply via email to