(Yonick?) I want to do a distributed grouping query over multiple shards, using group.ngroups to find the total number of groups. It seems to be giving me the sum of ngroups for each shard, rather than the count of the union of the groups from each shard. Is this a bug or the “expected” behavior? I read the wiki carefully and didn’t see any disclaimer about ngroups for distributed search, so that suggests that it is a “bug”, but wikis tend to be unreliable “contracts.” There have been several recent Jira issues in this area (SOLR-3109, SOLR-3316, SOLR-3436), but none seemed specific to my scenario.
In my test case, my first shard has 4 groups and the second shard has 5 groups, with some groups overlapping. The total number of groups is 6, but Solr reports an ngroups value of 9. Over-simplifying, my first shard has c1, c2, c3, c5 and my second shard has c1, c2, c3, c4, c6. Note: The actual groups returned by the query are correct and as expected, 6 of them: c1, c2, c3, c5, c4, c6 when the query is send to the first node and c1, c2, c3, c4, c6, c5 when the query is sent to the second node. I did find SOLR-2066 (Search Grouping: support distributed search) which has this comment: “It is important that all documents of one group are in the same shard. Otherwise the groupCount will be incorrect”, which seems to describe what I am seeing. But, a random comment in a Jira does not constitute a contract. See: https://issues.apache.org/jira/browse/SOLR-2066 I’ll file a Jira (bug), but only if nobody can convince me that 9 is the correct answer for my ngroups scenario. But if the comment in 2066 is “correct”, maybe it will be an “improvement” issue. -- Jack Krupansky