Re: Accuracy issue of aggregation results

2014-12-17 Thread Yifan Wang
Just in case anyone is interested, weighted collect (collect more on shards of more documents) actually does not necessarily improve the accuracy if the documents are distributed by default hash algorithm. There is no such correlations. On Tuesday, September 16, 2014 5:09:51 PM UTC-4, Yifan

Accuracy issue of aggregation results

2014-09-16 Thread Yifan Wang
It seems to be a common problem that the top N results returned from an aggregation query is inaccurate due to uneven distribution of matching documents on different shards, because ES will collect top N buckets from each shard no matter actually how many hits are on each shard. It is very

Re: Accuracy issue of aggregation results

2014-09-16 Thread Matt Weber
Hi Yifan, Nothing dynamic, but you can increase the number of terms collected on each shard to increase the accuracy [1]. Might also want to play with the shard_min_doc_count value if you know certain shards have a low hit count and are throwing off the aggregations [2]. [1]