Thanks Shawn and Erick.

This is what I also ended up finding, as the number of buckets increased, I
noticed the issue.

Zheng: I am using Solr7. But this was only an experiment on the hash, i.e.,
what distribution should I expect from it. (as the above gist shows). I
didn't actually index into solr7 but would expect it to do something like
the above if I had actually indexed in solr with these partitions and Ids.





On Fri, Mar 16, 2018 at 9:24 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> What Shawn said. 117 shards and 116 docs tells you absolutely nothing
> useful. I've never seen the number of docs on various shards be off by
> more than 2-3% when enough docs are indexed to be statistically valid.
>
> Best,
> Erick
>
> On Fri, Mar 16, 2018 at 5:34 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> > On 3/6/2018 11:53 AM, Nawab Zada Asad Iqbal wrote:
> >>
> >> I have 117 shards and i tried to use document ids from zero to 116. I
> find
> >> that the distribution is very uneven, e.g., the largest bucket receives
> >> total 5 documents; and around 38 shards will be empty.  Is it expected?
> >
> >
> > With such a small data set, this fits what I would expect.
> >
> > Choosing buckets by hashing (which is what compositeId does) is not
> perfect,
> > but if you send it thousands or millions of documents, it will be
> > *generally* balanced.
> >
> > Thanks,
> > Shawn
> >
>

Reply via email to