search engn dev [sachinyadav0...@gmail.com] wrote: > out of 700 million documents 95-97% values are unique approx.
That's quite a lot. If you are not already using DocValues for that, you should do so. So, each shard handles ~175M documents. Even with DocValues, there is an overhead of just having the base facet structure in memory, but that should be doable with 8GB of heap. It helps if your field is single-value and marked as such in the schema. If your indexes are rarely updated, you can consider optimizing down to a single segment: Single-value-field + DocValues + single-segment = very low memory overhead for the base facet structures. However, As Eric states, each call temporarily allocates ~175M buckets aka ints, which means 700MB of heap allocation. When you get the call to work, you should limit the number of concurrent searches severely. Remember to do it outside of the Solr Cloud, to avoid dead-locks. Performance of high-cardinality faceting with stock Solr is not great (but not that bad either, considering what it does), but if you really need it you can work around some of the issues. SOLR-5894 (https://issues.apache.org/jira/browse/SOLR-5894 - I am the author) makes it possible to get the results faster for small result sets and to limit the amount of memory used by the buckets: If the maximum possible count for any term in your user_digest field is 20 (just an example), the buckets need only be 6 bits each (2^6=32 > 20) and the temporary memory allocation for each call will be 175M * 6bit / 8 bits/byte = 132MByte. Read http://sbdevel.wordpress.com/2014/04/04/sparse-facet-counting-without-the-downsides/ for a detailed description. - Toke Eskildsen