After upgrading our Solr Cloud collections from 8.7.0 to 8.8.0 I struggle
to get a consistent state.  We have 8 servers hosting 3 collections, with
shards/replicas spread over all the servers.

All replicas on solr3577 is in "Recovering" state, and is repeating every
five minutes: "RemoteSolrException: Error from server at
http://solr3579.foo.bar:12621/solr: Timeout waiting for collection state",
as you see here:


ERROR [20210205T090741,988]
recoveryExecutor-11-thread-8-processing-n:solr3579.foo.bar:12621_solr
x:foo_bar_shard22_replica_n86 c:foo_bar s:shard22 r:core_node89
org.apache.solr.cloud.RecoveryStrategy - Recovery failed - trying again...
(12)
ERROR [20210205T090741,995]
recoveryExecutor-11-thread-9-processing-n:solr3579.foo.bar:12621_solr
x:foo_bar_shard2_replica_n6 c:foo_bar s:shard2 r:core_node9
org.apache.solr.cloud.RecoveryStrategy - Error while trying to recover.
core=foo_bar_shard2_replica_n6:java.util.concurrent.ExecutionException:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://solr3579.foo.bar:12621/solr: Timeout waiting for
collection state.
        at
java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at
java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
        at
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:876)
        at
org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:614)
        at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:333)
        at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:316)
        at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)
        at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
        at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
        at java.base/java.lang.Thread.run(Thread.java:832)
Caused by:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://solr3579.foo.bar:12621/solr: Timeout waiting for
collection state.
        at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681)
        at
org.apache.solr.client.solrj.impl.HttpSolrClient.lambda$httpUriRequest$0(HttpSolrClient.java:310)
        ... 5 more
ERROR [20210205T090741,995]
recoveryExecutor-11-thread-9-processing-n:solr3579.foo.bar:12621_solr
x:foo_bar_shard2_replica_n6 c:foo_bar s:shard2 r:core_node9
org.apache.solr.cloud.RecoveryStrategy - Recovery failed - trying again...
(12)


At the same time solr3579 is repeating "NotInClusterStateException: Timeout
waiting for collection state", as seen here:


ERROR [20210205T090741,994] qtp313082880-176670
org.apache.solr.servlet.HttpSolrCall -
org.apache.solr.cloud.ZkController$NotInClusterStateException: Timeout
waiting for collection state.
        at
org.apache.solr.handler.admin.PrepRecoveryOp.execute(PrepRecoveryOp.java:163)
        at
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367)
        at
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397)
        at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)
        at
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:836)
        at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:800)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:545)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357)
        at
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201)
        at
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
        at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
        at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1612)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
        at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
        at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
        at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1582)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
        at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
        at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.server.Server.handle(Server.java:516)
        at
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
        at
org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:556)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
        at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
        at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at
org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
        at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
        at java.base/java.lang.Thread.run(Thread.java:832)


How do I remedy this?  I have restarted solr3577 and don't know if I dare
to restart solr3579 (which is active and leader).

Cheers,
Henrik

Reply via email to