I think what you're seeing might be a result of the overrequesting done
in phase #1 of a distriuted facet query.

The purpose of overrequesting is to mitigate the possibility of a 
constraint which should be in the topN for the collection as a whole, but 
just outside the topN on every shard -- so they never make it to the 
second phase of the distributed calculation.

The amount of overrequest is, by default, a multiplicitive function of the 
user specified facet.limit with a fudge factor (IIRC: 10+(1.5*facet.limit))

If you're using an explicitly high facet.limit, you can try setting the 
overrequets ratio/count to 1.0/0 respectively to force Solr to only 
request the # of constraints you've specified from each shard, and then 
aggregate them...

https://lucene.apache.org/solr/6_3_0/solr-solrj/org/apache/solr/common/params/FacetParams.html#FACET_OVERREQUEST_RATIO
https://lucene.apache.org/solr/6_3_0/solr-solrj/org/apache/solr/common/params/FacetParams.html#FACET_OVERREQUEST_COUNT



One side note related to the work around you suggested...

: One simple solution, in my case would be, now just thinking of it, run 
: the query with no facets and no rows, get the numFound, and set that as 
: facet.limit for the actual query.

...that assumes that the number of facet constraints returned is limited 
by the total number of documents matching the query -- in general there is 
no such garuntee because of multivalued fields (or faceting on tokenized 
fields), so this type of approach isn't a good idea as a generalized 
solution



-Hoss
http://www.lucidworks.com/

Reply via email to