Thanks Shawn and Erick. This is what I also ended up finding, as the number of buckets increased, I noticed the issue.
Zheng: I am using Solr7. But this was only an experiment on the hash, i.e., what distribution should I expect from it. (as the above gist shows). I didn't actually index into solr7 but would expect it to do something like the above if I had actually indexed in solr with these partitions and Ids. On Fri, Mar 16, 2018 at 9:24 AM, Erick Erickson <erickerick...@gmail.com> wrote: > What Shawn said. 117 shards and 116 docs tells you absolutely nothing > useful. I've never seen the number of docs on various shards be off by > more than 2-3% when enough docs are indexed to be statistically valid. > > Best, > Erick > > On Fri, Mar 16, 2018 at 5:34 AM, Shawn Heisey <apa...@elyograg.org> wrote: > > On 3/6/2018 11:53 AM, Nawab Zada Asad Iqbal wrote: > >> > >> I have 117 shards and i tried to use document ids from zero to 116. I > find > >> that the distribution is very uneven, e.g., the largest bucket receives > >> total 5 documents; and around 38 shards will be empty. Is it expected? > > > > > > With such a small data set, this fits what I would expect. > > > > Choosing buckets by hashing (which is what compositeId does) is not > perfect, > > but if you send it thousands or millions of documents, it will be > > *generally* balanced. > > > > Thanks, > > Shawn > > >