On 5/16/2016 6:29 PM, Li Ding wrote:
> This happened when the second time I'm performing restart.  But after that,
> every time this collection is stuck at here.  If I restart the leader node
> as well, the core can get out of the recovering state
>
> On Mon, May 16, 2016 at 5:00 PM, Li Ding <li.d...@bloomreach.com> wrote:
>> This is for restart solr with 1000 collections.  I created an environment
>> with 1023 collections today All collections are empty.  During repeated
>> restart test, one of the cores are marked as "recovering" and stuck there
>> for ever.   The solr is 4.6.1 and we have 3 zk hosts and 8 solr hosts, here
>> is the relevant logs:

SolrCloud does not handle that many collections very well, especially
with a lot of them per server.  After I did some experimentation with a
lot more collections than you have, I opened this issue:

https://issues.apache.org/jira/browse/SOLR-7191

The stability and scalability gets a little bit better with each new
release, but when you push it too far, it does not work well.

How many Solr instances are in your cloud?  If you want good performance
and stability with a thousand collections, you'll probably need a lot of
servers, so each server is only handling a relatively small number of
cores.  I do not have any precise information about how many cores
(shard replicas) is too many for one server.  You should make that
number as small as you can.

Upgrading Solr *might* help with this situation, but really I think
you'll need to either run fewer collections or run more instances.  You
might be able to run multiple Solr instances per server, but if you do
that, be sure that you don't give all your memory to java.  Enough
memory must be available to the operating system for caching the
important parts of your index.

Thanks,
Shawn

Reply via email to