If you have GC logs, check if you have long GC pauses that make zookeeper
think that node(s) are going down. If this is the cases then your nodes are
going into recovery and and based on your settings in <solrCloud> in
solr.xml you may end up in situation when no nodes gets promoted to be a
leader.



On 22 December 2015 at 08:46, Bram Van Dam <bram.van...@intix.eu> wrote:

> Hi folks,
>
> Been doing some SolrCloud testing and I've been experiencing some
> problems. I'll try to be relatively brief, but feel free to ask for
> additional information.
>
> I've added about 200 million documents to a SolrCloud. The cloud
> contains 3 collections, and all documents were added to all three
> collections.
>
> While indexing these documents, we noticed 486k (!!) "No registered
> leader was found"-errors. 482k (!!) of which referred to the same shard.
> The other shards are or more or less evenly distributed in the log.
>
> This indexing job has been running for about 5 days now, and is pretty
> much IO-bound. CPU usage is ~50%. The load average, on the other hand,
> has been 128 for 5 days straight. Which is high, but fine: the machine
> is responsive.
>
> Memory usage is fine. Most of it is going towards file system caches and
> the like. Each Solr instance has 8GB Xmx, and is currently using about
> 7GB. I haven't noticed any OutOfMemoryErrors in the log files.
>
> Monitoring shows that both Solr instances have been up throughout these
> procedings.
>
> Now, I'm willing to accept that these Solr instances don't have enough
> memory, or anything else, but I'm not seeing any of this reflected in
> the log files, which I'm finding troubling.
>
> What I do notice in the log file, is the very vague "SolrException:
> Service Unavailable". See below.
>
> Could anyone shed some light on what could be causing these errors?
>
> Thanks a bunch,
>
>  - Bram
>
>
> SolrCloud Setup:
> ----------------
>
> - Version: 5.4.0
> - 3 Collections
> -- firstCollection : 18 shards
> -- secondCollection: 36 shards
> -- thirdCollection : 79 shards
> - Routing: implicit
> - 2 Solr Instances
> -- 8GB Xmx.
>
> Machine:
> --------
> - Hexacore Xeon E5-1650
> - 64GB RAM
> - 50TB Disk (RAID6, 10 disks)
>
> Leader Stack Trace:
> -------------------
>
> Caused by:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: No
> registered leader was found after waiting for 4000ms , collection:
> biweekly slice: thirdCollectionShard39
>         at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
> ~[solr-solrj-4.7.1.jar:4.7.1 1582953 - sarowe - 2014-03-29 00:43:32]
>         at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
> ~[solr-solrj-4.7.1.jar:4.7.1 1582953 - sarowe - 2014-03-29 00:43:32]
>         at
>
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:118)
> ~[solr-solrj-4.7.1.jar:4.7.1 1582953 - sarowe - 2014-03-29 00:43:32]
>         at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
> ~[solr-solrj-4.7.1.jar:4.7.1 1582953 - sarowe - 2014-03-29 00:43:32]
>         at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
> ~[solr-solrj-4.7.1.jar:4.7.1 1582953 - sarowe - 2014-03-29 00:43:32]
>
>
> Service Unavailable Log:
> ------------------------
>
>
> 527280878 ERROR (qtp59559151-194160) [c:collectionTwo
> s:collectionTwoShard12 r:core_node12
> x:collectionTwo_collectionTwoShard12_replica1]
> o.a.s.u.SolrCmdDistributor forwarding update to
> http://[CENSORED]:8983/solr/collectionTwo_collectionTwoShard1_replica1/
> failed - retrying ... retries: 15 add{,id=000195641101}
> params:update.distrib=TOLEADER&distrib.from=http://
> [CENSORED]:6666/solr/collectionTwo_collectionTwoShard12_replica1/
> rsp:503:org.apache.solr.common.SolrException: Service Unavailable
>
>
>
>

Reply via email to