Hi,

I have problem with faceting count in distributed grouping. It appears only
when I make query that returns almost all of the documents.

My SOLR implementation has 4 shards and my queries looks like:

http://host:port
/select/q?=*:*&shards=shard1,shard2,shard3,shard4&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1

With query like above I get strange counts for field category1.
The counts for values are very big:
<int name="val1">9659</int>
<int name="val2">7015</int>
<int name="val3">5676</int>
<int name="val4">1180</int>
<int name="val5">1105</int>
<int name="val6">979</int>
<int name="val7">770</int>
<int name="val8">701</int>
<int name="">612</int>
<int name="val9">422</int>
<int name="val10">358</int>

When I make query to narrow the results adding to query
fq=category1:"val1", etc. I get different counts than facet category1 shows
for a few first values:

fq=category1:"val1" - counts: 22
fq=category1:"val2" - counts: 22
fq=category1:"val3" - counts: 21
fq=category1:"val4" - counts: 19
fq=category1:"val5" - counts: 19
fq=category1:"val6" - counts: 20
fq=category1:"val7" - counts: 20
fq=category1:"val8" - counts: 25
fq=category1:"val9" - counts: 422
fq=category1:"val10" - counts: 358

>From val9 the count is ok.

First I thought that for some values in facet "category1" groups count does
not work and it returns counts of all documents not group by field id.
But the number of all documents matches query  fq=category1:"val1" is
45468. So the numbers are not the same.

I check the queries on each shard for val1 and the results are:

shard1:
query:
http://shard1/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1

<lst name="fcategory">
<int name="val1">11</int>

query:
http://shard1/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1&fq=category1
:"val1"

shard 2:
query:
http://shard2/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1

there is no value "val1" in category1 facet.

query:
http://shard2/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1&fq=category1
:"val1"

<int name="ngroups">7</int>

shard3:
query:
http://shard3/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1

there is no value val1 in category1 facet

query:
http://shard3/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1&fq=category1
:"val1"

<int name="ngroups">4</int>

So it looks that detail query with fq=category1:"val1" returns the relevant
results. But Solr has problem with faceting counts when one of the shard
does not return the faceting value (in this scenario "val1") that exists on
other shards.

I checked shards for "val10" and I got:

shard1: count for val10 - 142
shard2: count for val10 - 131
shard3: count for val10 -  149
sum of counts 422 - ok.

I'm not sure how to resolve that situation. For sure the counts of val1 to
val9 should be different and they should not be on the top of the category1
facet because this is very confusing. Do you have any idea how to fix this
problem?

Best regards
Agnieszka

Reply via email to