I do notice you are using hll (hyper-log-log) which is a distributed cardinality *estimate* : https://en.wikipedia.org/wiki/HyperLogLog
-Yonik On Fri, Nov 10, 2017 at 11:32 AM, kenny <ke...@ontoforce.com> wrote: > Hi all, > > We are doing some tests in solr 6.6 with json facet api and we get > completely wrong counts for some combination of facets > > Setting: We have a set of fields for 376k documents in our query (total 120M > documents). We work with 2 shards. When doing first a faceting over the > first facet and keeping these numbers, we subsequently do a nested faceting > over both facets. > > Then we add the numbers of sub-facet and expect to get the (approximately) > the same numbers back. Sometimes we get rounding errors of about 1% > difference. But on other occasions it seems to way off > > for example > > Gender (3 values) Country (211 values) > 16226 - 18424 = -2198 (-13.5461604832%) > 282854 - 464387 = -181533 (-64.1790464338%) > 40489 - 47902 = -7413 (-18.3086764306%) > 36672 - 49749 = -13077 (-35.6593586387%) > > Gender (3 values) Status (17 Values) > 16226 - 16273 = -47 (-0.289658572661%) > 282854 - 435974 = -153120 (-54.1339348215%) > 40489 - 49925 = -9436 (-23.305095211%) > 36672 - 54019 = -17347 (-47.3031195462%) > > ... > > These are the typical requests we submit. So note that we have refine and an > overrequest, but we in the case of Gender vs Request we should query all the > buckets anyway. > > {"wt":"json","rows":0,"json.facet":"{\"Status_sfhll\":\"hll(Status_sf)\",\"Status_sf\":{\"type\":\"terms\",\"field\":\"Status_sf\",\"missing\":true,\"refine\":true,\"overrequest\":50,\"limit\":50,\"offset\":0}}","q":"*:*","fq":["type:\"something\""]} > > {"wt":"json","rows":0,"json.facet":"{\"Gender_sf\":{\"type\":\"terms\",\"field\":\"Gender_sf\",\"missing\":true,\"refine\":true,\"overrequest\":10,\"limit\":10,\"offset\":0,\"facet\":{\"Status_sf\":{\"type\":\"terms\",\"field\":\"Status_sf\",\"missing\":true,\"refine\":true,\"overrequest\":50,\"limit\":50,\"offset\":0}}},\"Gender_sfhll\":\"hll(Gender_sf)\"}","q":"*:*","fq":["type:\"something\""]} > > Is this a known bug? Would switching to old facet api resolve this? Are > there other parameters we miss? > > > Thanks > > > kenny > >