Hi,

You could start with one node on which you could start with # shards
== # CPU cores.
Then, all while running a stress/performance test, observe the latency
and other metrics you care about.
Keep increasing the number of shards and keep observing.

SPM for Solr (see signature) will help with the observing part.
JMeter or SolrMeter (hi Tomás ;)) will help with stress testing part.

You cannot change the number of shards on the fly, reindexing is needed.
The above also doesn't take into account index/shard size, but that is
dimension to experiment with, too.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Thu, Oct 4, 2012 at 2:43 PM, Jason Huang <jason.hu...@icare.com> wrote:
> Tomás,
>
> Thanks for the response.
>
> So basically at this point what I could do is to make a "best guess"
> of my estimated index size and specify a few shards to start with. I
> am guessing if I assigned too many shards, then the "join" between
> different shards may be the bottleneck? On the other side, if I assign
> only one or two shards, then each shard may become too big and the I/O
> within each shard will be the bottleneck?
>
> Then after a while of deployment, if we find out where the bottleneck
> is, do we have a way to adjust the number of shards without breaking
> the indexing and without require any downtime in production system?
> Say I have 4 shards and each of them is 100GB. I found that the I/O is
> the bottleneck and I want to use 8 shards instead - is there a good
> way to redistribute the whole index from 4 existing shards to 8 shards
> without breaking anything (and without a downtime)?
>
> thanks!
>
> Jason
>
>
>
> On Thu, Oct 4, 2012 at 1:36 PM, Tomás Fernández Löbbe
> <tomasflo...@gmail.com> wrote:
>> SolrCloud doesn't auto-shard at this point. It doesn't split indexes either
>> (there is an open issue for this:
>> https://issues.apache.org/jira/browse/SOLR-3755 )
>>
>> At this point you need to specify the number of shards for a collection in
>> advance, with the numShards parameter. When you have more than one shard
>> for a collection, SolrCloud automatically distributes the query to one
>> replica of each shard and join the results for you.
>>
>> Most reliable documentation about SolrCloud can be found here:
>> http://wiki.apache.org/solr/SolrCloud
>>
>> Tomás
>>
>> On Thu, Oct 4, 2012 at 12:02 PM, Jason Huang <jason.hu...@icare.com> wrote:
>>
>>> Hello,
>>>
>>> I am exploring SolrCloud and have a few questions about SolrCloud's
>>> auto-sharding functionality. I couldn't find any good answer from my
>>> online search - if anyone knows the answer to these questions or can
>>> point me to the right document, that would be great!
>>>
>>> (1) Does SolrCloud offer auto-sharding functionality? If we
>>> continuously feed documents to a single index, eventually the shard
>>> will grow to a huge size and the query will be slow. How does
>>> SolrCloud handle this situation?
>>>
>>> (2) If SolrCloud auto-splits a big shard to two small shards, then
>>> shard 1 will have part of the index and shard 2 will have some other
>>> part of index. Is this correct? If so, when we perform a query, do we
>>> need to go through both shards in order to get a good response? Will
>>> this be slow (because we need to go through two shards, or more shards
>>> later if we need to split the shards again when the size is too big)?
>>>
>>> thanks!
>>>
>>> Jason
>>>

Reply via email to