Tomás, Thanks for the response.
So basically at this point what I could do is to make a "best guess" of my estimated index size and specify a few shards to start with. I am guessing if I assigned too many shards, then the "join" between different shards may be the bottleneck? On the other side, if I assign only one or two shards, then each shard may become too big and the I/O within each shard will be the bottleneck? Then after a while of deployment, if we find out where the bottleneck is, do we have a way to adjust the number of shards without breaking the indexing and without require any downtime in production system? Say I have 4 shards and each of them is 100GB. I found that the I/O is the bottleneck and I want to use 8 shards instead - is there a good way to redistribute the whole index from 4 existing shards to 8 shards without breaking anything (and without a downtime)? thanks! Jason On Thu, Oct 4, 2012 at 1:36 PM, Tomás Fernández Löbbe <tomasflo...@gmail.com> wrote: > SolrCloud doesn't auto-shard at this point. It doesn't split indexes either > (there is an open issue for this: > https://issues.apache.org/jira/browse/SOLR-3755 ) > > At this point you need to specify the number of shards for a collection in > advance, with the numShards parameter. When you have more than one shard > for a collection, SolrCloud automatically distributes the query to one > replica of each shard and join the results for you. > > Most reliable documentation about SolrCloud can be found here: > http://wiki.apache.org/solr/SolrCloud > > Tomás > > On Thu, Oct 4, 2012 at 12:02 PM, Jason Huang <jason.hu...@icare.com> wrote: > >> Hello, >> >> I am exploring SolrCloud and have a few questions about SolrCloud's >> auto-sharding functionality. I couldn't find any good answer from my >> online search - if anyone knows the answer to these questions or can >> point me to the right document, that would be great! >> >> (1) Does SolrCloud offer auto-sharding functionality? If we >> continuously feed documents to a single index, eventually the shard >> will grow to a huge size and the query will be slow. How does >> SolrCloud handle this situation? >> >> (2) If SolrCloud auto-splits a big shard to two small shards, then >> shard 1 will have part of the index and shard 2 will have some other >> part of index. Is this correct? If so, when we perform a query, do we >> need to go through both shards in order to get a good response? Will >> this be slow (because we need to go through two shards, or more shards >> later if we need to split the shards again when the size is too big)? >> >> thanks! >> >> Jason >>