RE: SolrCloud performance issues regarding hardware configuration

Toke Eskildsen Fri, 18 Jul 2014 15:07:36 -0700

search engn dev [sachinyadav0...@gmail.com] wrote:
> out of 700 million documents 95-97% values are unique approx.


That's quite a lot. If you are not already using DocValues for that, you should 
do so.

So, each shard handles ~175M documents. Even with DocValues, there is an 
overhead of just having the base facet structure in memory, but that should be 
doable with 8GB of heap. It helps if your field is single-value and marked as 
such in the schema. If your indexes are rarely updated, you can consider 
optimizing down to a single segment: Single-value-field + DocValues + 
single-segment = very low memory overhead for the base facet structures.

However, As Eric states, each call temporarily allocates ~175M buckets aka 
ints, which means 700MB of heap allocation. When you get the call to work, you 
should limit the number of concurrent searches severely. Remember to do it 
outside of the Solr Cloud, to avoid dead-locks.

Performance of high-cardinality faceting with stock Solr is not great (but not 
that bad either, considering what it does), but if you really need it you can 
work around some of the issues. SOLR-5894 
(https://issues.apache.org/jira/browse/SOLR-5894 - I am the author) makes it 
possible to get the results faster for small result sets and to limit the 
amount of memory used by the buckets: If the maximum possible count for any 
term in your user_digest field is 20 (just an example), the buckets need only 
be 6 bits each (2^6=32 > 20) and the temporary memory allocation for each call 
will be 175M * 6bit / 8 bits/byte = 132MByte. Read 
http://sbdevel.wordpress.com/2014/04/04/sparse-facet-counting-without-the-downsides/
 for a detailed description.

- Toke Eskildsen

RE: SolrCloud performance issues regarding hardware configuration

Reply via email to