Thanks for your response Cody,

  First, I used distributed grouping on 2 shards and I'm sure then all 
documents of each group are in the same shard.  

I take a look on JIRA issue and it seem really similar.  There is the same 
problem with group.ngroups.  The count is calculated in second pass so we only 
had result from "useful" shards and it's why when I increase rows limit i got 
the right count (they must use all my shards).

Except it's a feature (i hope not), I will create a new JIRA issue for this.

Thanks

On 2012-05-01, at 12:32 PM, Young, Cody wrote:

> Hello,
> 
> When you say 2 slices, do you mean 2 shards? As in, you're doing a 
> distributed query?
> 
> If you're doing a distributed query, then for group.ngroups to work you need 
> to ensure that all documents for a group exist on a single shard.
> 
> However, what you're describing sounds an awful lot like this JIRA issue that 
> I entered a while ago for distributed grouping. I found that the hit count 
> was coming only from the shards that ended up having results in the documents 
> that were returned. I didn't test group.ngroups at the time.
> 
> https://issues.apache.org/jira/browse/SOLR-3316
> 
> If this is a similar issue then you should make a new Jira issue.
> 
> Cody
> 
> -----Original Message-----
> From: Francois Perron [mailto:francois.per...@wantedanalytics.com] 
> Sent: Tuesday, May 01, 2012 6:47 AM
> To: solr-user@lucene.apache.org
> Subject: Grouping ngroups count
> 
> Hello all,
> 
>  I tried to use grouping with 2 slices with a index of 35K documents.  When I 
> ask top 10 rows, grouped by filed A, it gave me about 16K groups.  But, if I 
> ask for top 20K rows, the ngroups property is now at 30K.  
> 
> Do you know why and of course how to fix it ?
> 
> Thanks.

Reply via email to