bq: We will not be able to limit the documents per shard in Solr
Cloud. As Solr will accept all the documents as long as space is there
for it to index.

True, end of story ;).

How does Solr know it will run out of space? It hits an exception,
there's really no "this doesn't look like it will fit so let's not
index it". But that's not really a problem because you need at least
as much free space on your disk as the index size to handle merges so
you'll run into many, many, many other problems before you fill up
your disk.

The hashing function that's used to distribute the files across the
shards has not had any reports of significant uneven distribution that
I know of. So simply dividing the number of docs by number of shards
and assuming that number (+/- a very small number, < 1%) of docs will
get on each shard is usually good enough. If you see something
different it would be good to know

Best,
Erick

On Thu, May 7, 2015 at 12:45 AM, Jilani Shaik <jilani24...@gmail.com> wrote:
> Hi Daniel,
>
> Thanks for the detailed explanation.
>
> My understanding is also similar to you that we should not provide limit
> over the shard for number of documents that it can index. Usually it will
> depend on shard routing provided by Solr and I am not expecting any change
> to document routing process.
>
> My team needs that option if at all possible, Before saying "not possible
> at Solr end to limit the documents per shard", I just want to get
> confirmation or some details of this. So I dropped a question here to get
> answers.
>
> You mentioned that "as long as it has sufficient space to do index"
>      - How will Solr knows or estimate that "whether Solr has sufficient
> space to index or not on particular shard or on entire cloud?"
>
> Conclusion of my understand:
> We will not be able to limit the documents per shard in Solr Cloud. As Solr
> will accept all the documents as long as space is there for it to index.
>
> Please suggest.
>
> Thanks,
> Jilani
>
> On Thu, May 7, 2015 at 12:41 PM, Daniel Collins <danwcoll...@gmail.com>
> wrote:
>
>> Not sure I understand your problem.  If you have 20m documents, and 8
>> shards, then each shard is (broadly speaking) only going to have 2.5m docs
>> each, so I don't follow the 5m limit? That is with the default
>> routing/hashing, obviously you can write your own hash algorithm or you can
>> shard at your application level.
>>
>> In terms of limiting documents in a shard, I'm not sure what purpose that
>> would serve.  If for arguments sake you only had 2 shards, and a limit of
>> 5m doccs per shard, what happens when you hit that limit?  If you have
>> indexed 10m docs, and now you try to index one more, what would you expect
>> to happen, would the system just reject any documents, should it try to
>> shard to shard 1 but see that is full, and then fail-over to shard2 instead
>> (that's not going to work as sharding needs to be reproducible and the
>> document was intended for shard 1)?
>>
>> Solr's basic premise would be to index what you gave it, as long as it has
>> sufficient space to do that.  If you want to limit your index to 20m docs,
>> that is probably better done at the application layer (but I still don't
>> really see why you would want to do that).
>>
>> On 7 May 2015 at 06:29, Jilani Shaik <jilani24...@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > Is it possible to restrict number of documents per shard in Solr cloud?
>> >
>> > Lets say we have Solr cloud with 4 nodes, and on each node we have one
>> > leader and one replica. Like wise total we have 8 shards that includes
>> > replicas. Now I need to index my documents in such a way that each shard
>> > will have only 5 million documents. Total documents in Solr cloud should
>> be
>> > 20 million documents.
>> >
>> >
>> > Thanks,
>> > Jilani
>> >
>>

Reply via email to