On 2/20/2018 4:44 AM, Alfonso Muñoz-Pomer Fuentes wrote:
We have a query that we can resolve using either facet or search with rollup. 
In the Stream Source Reference section of Solr’s Reference Guide 
(https://lucene.apache.org/solr/guide/7_1/stream-source-reference.html#facet) 
it says “To support high cardinality aggregations see the rollup function”. I 
was wondering what it’s considered “high cardinality”. If it serves, our query 
returns up to 60k results. I haven’t got to do any benchmarking to see if 
there’s any difference, though, because facet so far performs very well, but I 
don’t know if I’m near the “tipping point”. Any feedback would be appreciated.

There's no hard and fast rule for this.  The tipping point is going to be different for every use case.  With a little bit of information about your setup, experienced users can make an educated guess about whether or not performance will be good, but cannot say with absolute certainty what you're going to run into.

Let's start with some definitions, which you may or may not already know:

https://en.wikipedia.org/wiki/Cardinality_(data_modeling)
https://en.wikipedia.org/wiki/Cardinality

You haven't said how many unique values are in your field.  The only information I have from you is 60K results from your queries, which may or may not have any bearing on the total number of documents in your index, or the total number of unique values in the field you're using for faceting.  So the next paragraph may or may not apply to your index.

In general, 60,000 unique values in a field would be considered very low cardinality, because computers can typically operate on 60,000 values *very* quickly, unless the size of each value is enormous.  But if the index has 60,000 total documents, then *in relation to other data*, the cardinality is very high, even though most people would say the opposite.  Sixty thousand documents or unique values is almost always a very small index, not prone to performance issues.

The warnings about cardinality in the Solr documentation mostly refer to *absolute* cardinality -- how many unique values there are in a field, regardless of the actual number of documents.  If there are millions or billions of unique values, then operations like facets, grouping, sorting, etc are probably going to be slow.  If there are a lot less, such as thousands or only a handful, then those operations are likely to be very fast, because the computer will have less information it must process.

Thanks,
Shawn

Reply via email to