Have you tried with docValues for the fields involved in the collapse group head selection ?
With a group head selection of "min" "max"and "sort" should work quite well. Of course it depends of your formula. Does your index change often ? If the warming time is not a problem you could try with : hint Currently there is only one hint available "top_fc", which stands for top level FieldCache. The top_fc hint is only available when collapsing on String fields. top_fc provides the best query time speed but takes the longest to warm on startup or following a commit. top_fc also will result in having the collapsed field cached in memory twice if the it's used for faceting or sorting. Cheers On Wed, Jun 29, 2016 at 1:59 AM, Jichi Guo <jichi...@gmail.com> wrote: > Thanks for the quick response, Joel! > > I am hoping to delay sharding if possible, which might involve more things > to > consider :) > > > > 1) What is the size of the result set before the collapse? > > > > When search with q=*:* for example, before collapse numFound is around 5 > million, and that after collapse is 2 million. > > I only return about the top 30 documents in the result. > > > > 2) Have you tested without the long formula, just using a field for the > min/max. It would be good to understand the impact of the formula on > performance. > > > > The performance seems to be affected by the number of fields appearing in > the > max formula. > > > > For example, that 5 million expensive query would take 4.4 sec. > > For both {!collapse field=productGroupId} and {!collapse > field=productGroupId > max=only_one_field}, the query time would reduce to around 2.4 sec. > > If I remove the entire collapse fq, then the query only took 1.3 sec. > > > > 3) How much memory do you have on the server and for the heap. Memory use > rises with the cardinality of the collapse field. So you'll want to be sure > there is enough memory to comfortably perform the collapse. > > > > I am setting Xmx to 24G. The total index size on disk is 50G. > > In solrconfig.xml, I use solr.FastLRUCache for filterCache with cache size > 2048, solr.LRUCache for documentCache with cache size 32768, and > solr.LRUCache > for queryResultCache with cache size 4096. I am using default > fieldValueCache. > > > > I found Collapsing QParser plugin explicitly uses lucene's field cache. > > Maybe, increasing fieldCache would help? But I am not sure how to > increase it > in Solr. > > > Sent from [Nylas N1](https://link.nylas.com/link/5tkvmhpozan5j5h3lhni487b > /local- > > 481233c4-d727/0?redirect=https%3A%2F%2Fnylas.com%2Fn1%3Fref%3Dn1&r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn), > the extensible, open source mail client. > > ![](https://link.nylas.com/open/5tkvmhpozan5j5h3lhni487b/local- > 481233c4-d727?r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn) > > On Jun 28 2016, at 4:48 pm, Joel Bernstein <joels...@gmail.com> > wrote: > > > Sharding will help, but you'll need to co-locate documents by group ID. A > few questions / suggestions: > > > > > > > > > > > > > > 1) What is the size of the result set before the collapse? > > > > > > 2) Have you tested without the long formula, just using a field for the > min/max. It would be good to understand the impact of the formula on > performance. > > > > > > 3) How much memory do you have on the server and for the heap. Memory use > rises with the cardinality of the collapse field. So you'll want to be sure > there is enough memory to comfortably perform the collapse. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Joel Bernstein > > > > > > > [ > http://joelsolr.blogspot.com/](http://joelsolr.blogspot.com/&r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn) > > > > > > > > > > > > > > > > On Tue, Jun 28, 2016 at 4:08 PM, jichi > <[jichi...@gmail.com](mailto:jichi...@gmail.com)> wrote: > > > > > > > >> Hi everyone, > > > > I am using Solr 4.10 to index 20 million documents without sharding. > > Each document has a groupId field, and there are about 2 million groups. > > I found the search with collapsing on groupId significantly slower > > comparing to without collapsing, especially when combined with facet > > queries. > > > > I am wondering what would be the general approach to speedup field > > collapsing by 2~4 times? > > Would sharding the index help? > > Is it possible to optimize collapsing without sharding? > > > > The filter parameter for collapsing is like this: > > > > q=*:*&fq={!collapse field=groupId max=sum(...a long formula...)} > > > > I also put this fq into warmup queries xml to warmup caches. But still, > > when q changes and more fq are added, the collapsing search would take > > about 3~5 seconds. Without collapsing, the search can finish within 2 > > seconds. > > > > I am thinking to manually optimize CollapsingQParserPlugin through > > parallelization or extra caching. > > For example, is it possible to parallelize collapsing collector by > > different lucene index segments? > > > > Thanks! > > > > \-- > > jichi > > > > > > > > > > > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England