Eric,

We're using Solr 5.5.4 and aren't really eager to change at this moment...

Off the top of your head - what probability that the patch here:
https://issues.apache.org/jira/browse/SOLR-10524

... will work in 5.5.4 with minimal difficulty?

For example, were there other classes introduced in 6 that the patch
uses/depends on?

Thanks...

On Fri, Jun 9, 2017 at 12:03 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> Hi all,
>
> Here's my situation...
>
> In AWS with zookeeper / solr.
>
> When trying to spin up additional Solr boxes from an "auto scaling group"
> I get this failure.
>
> The code used is exactly the same code that successfully spun up the first
> 3 or 4 solr boxes in each "auto scaling group"
>
> Below is a copy of my email to some of my compatriots within the company
> who also use solr/zookeeper....
>
> I'm looking for any advice on what _might_ be the cause of this
> failure...  Overload on Zookeeper in some way is our best guess.
>
> I know this isn't a zookeeper forum - - just hoping someone out there has
> some experience troubleshooting similar issues.
>
> Many thanks in advance...
>
> =====
>
> We have 6 zookeepers. (3 of them are observers).
>
> They are not under a load balancer
>
> How do I check if zookeeper nodes are under heavy load?
>
>
> The problem arises when we try to scale up with more solr nodes. Current
> setup we have 160 nodes connected to zookeeper. Each node with 40 cores, so
> around 6400 cores. When we scale up, 40 to 80 solr nodes will spin up at
> one time.
>
> And we are getting errors like these that stops the index distribution
> process:
>
> 2017-06-05 20:06:34.357 ERROR [pool-3-thread-2] o.a.s.c.CoreContainer -
> Error creating core [p44_b1_s37]: Could not get shard id for core:
> p44_b1_s37
>
>
> org.apache.solr.common.SolrException: Could not get shard id for core:
> p44_b1_s37
>
> at org.apache.solr.cloud.ZkController.waitForShardId(
> ZkController.java:1496)
>
> at org.apache.solr.cloud.ZkController.doGetShardIdAndNodeNameProcess
> (ZkController.java:1438)
>
> at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1548)
>
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:815)
>
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:757)
>
> at com.ancestry.solr.servlet.AcomServlet.indexTransfer(
> AcomServlet.java:319)
>
> at com.ancestry.solr.servlet.AcomServlet.lambda$indexTransferStart$1(
> AcomServlet.java:303)
>
> at com.ancestry.solr.service.IndexTransferWorker.run(
> IndexTransferWorker.java:78)
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
> Which we predict has to do with zookeeper not responding fast enough.
>

Reply via email to