
Thanks for the response.

So basically at this point what I could do is to make a "best guess"
of my estimated index size and specify a few shards to start with. I
am guessing if I assigned too many shards, then the "join" between
different shards may be the bottleneck? On the other side, if I assign
only one or two shards, then each shard may become too big and the I/O
within each shard will be the bottleneck?

Then after a while of deployment, if we find out where the bottleneck
is, do we have a way to adjust the number of shards without breaking
the indexing and without require any downtime in production system?
Say I have 4 shards and each of them is 100GB. I found that the I/O is
the bottleneck and I want to use 8 shards instead - is there a good
way to redistribute the whole index from 4 existing shards to 8 shards
without breaking anything (and without a downtime)?



On Thu, Oct 4, 2012 at 1:36 PM, Tomás Fernández Löbbe
<> wrote:
> SolrCloud doesn't auto-shard at this point. It doesn't split indexes either
> (there is an open issue for this:
> )
> At this point you need to specify the number of shards for a collection in
> advance, with the numShards parameter. When you have more than one shard
> for a collection, SolrCloud automatically distributes the query to one
> replica of each shard and join the results for you.
> Most reliable documentation about SolrCloud can be found here:
> Tomás
> On Thu, Oct 4, 2012 at 12:02 PM, Jason Huang <> wrote:
>> Hello,
>> I am exploring SolrCloud and have a few questions about SolrCloud's
>> auto-sharding functionality. I couldn't find any good answer from my
>> online search - if anyone knows the answer to these questions or can
>> point me to the right document, that would be great!
>> (1) Does SolrCloud offer auto-sharding functionality? If we
>> continuously feed documents to a single index, eventually the shard
>> will grow to a huge size and the query will be slow. How does
>> SolrCloud handle this situation?
>> (2) If SolrCloud auto-splits a big shard to two small shards, then
>> shard 1 will have part of the index and shard 2 will have some other
>> part of index. Is this correct? If so, when we perform a query, do we
>> need to go through both shards in order to get a good response? Will
>> this be slow (because we need to go through two shards, or more shards
>> later if we need to split the shards again when the size is too big)?
>> thanks!
>> Jason

Reply via email to