(Yonick?)

I want to do a distributed grouping query over multiple shards, using 
group.ngroups to find the total number of groups. It seems to be giving me the 
sum of ngroups for each shard, rather than the count of the union of the groups 
from each shard. Is this a bug or the “expected” behavior? I read the wiki 
carefully and didn’t see any disclaimer about ngroups for distributed search, 
so that suggests that it is a “bug”, but wikis tend to be unreliable 
“contracts.” There have been several recent Jira issues in this area 
(SOLR-3109, SOLR-3316, SOLR-3436), but none seemed specific to my scenario.

In my test case, my first shard has 4 groups and the second shard has 5 groups, 
with some groups overlapping. The total number of groups is 6, but Solr reports 
an ngroups value of 9.

Over-simplifying, my first shard has c1, c2, c3, c5 and my second shard has c1, 
c2, c3, c4, c6.

Note: The actual groups returned by the query are correct and as expected, 6 of 
them: c1, c2, c3, c5, c4, c6 when the query is send to the first node and c1, 
c2, c3, c4, c6, c5 when the query is sent to the second node.

I did find SOLR-2066 (Search Grouping: support distributed search) which has 
this comment: “It is important that all documents of one group are in the same 
shard. Otherwise the groupCount will be incorrect”, which seems to describe 
what I am seeing. But, a random comment in a Jira does not constitute a 
contract.

See:
https://issues.apache.org/jira/browse/SOLR-2066

I’ll file a Jira (bug), but only if nobody can convince me that 9 is the 
correct answer for my ngroups scenario. But if the comment in 2066 is 
“correct”, maybe it will be an “improvement” issue.

-- Jack Krupansky

Reply via email to