Hello guys I am using Solr cloud 7.7 on Kubernetes. During the adding of replica sometimes we see inconsistency after successful addition nodes go to recovery status sometimes it takes 2-3 minute to recover while sometimes it takes more than an hour. We are getting this error. We have 4 shards each shard has around 7GB of data. After seeing the system metrics we see bandwidth exchanges are high between the leader and the new replica node. Do we have any way to rate-limit the bandwidth exchange like we had some configuration for it in master-slave? maxMbpersec something like that?
Error > 2020-12-01 13:40:34.983 ERROR > (recoveryExecutor-4-thread-1-processing-n:solr-olxid-statefulset-pull-9.solr-olxid-statefulset-headless.relevance:8983_solr > x:olxid-20200531_d6e431ec_shard2_replica_p3955 c:olxid-20200531_d6e431ec > s:shard2 r:core_node3956) [c:olxid-20200531_d6e431ec s:shard2 r:core_node3956 > x:olxid-20200531_d6e431ec_shard2_replica_p3955] o.a.s.c.RecoveryStrategy > Error while trying to > recover:org.apache.solr.client.solrj.SolrServerException: Timeout occured > while waiting response from server at: > http://solr-olxid-statefulset-tlog-7.solr-olxid-statefulset-headless.relevance:8983/solr/olxid-20200531_d6e431ec_shard2_replica_t139 > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:654) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211) > at > org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:287) > at > org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:215) > at > org.apache.solr.cloud.RecoveryStrategy.doReplicateOnlyRecovery(RecoveryStrategy.java:382) > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:328) > at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:307) > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: java.net.SocketTimeoutException: Read timed out > at java.base/java.net.SocketInputStream.socketRead0(Native Method) > at > java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115) > at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168) > at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140) > at > org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) > at > org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) > at > org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282) > at > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) > at > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) > at > org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) > at > org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) > at > org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) > at > org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) > at > org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) > at > org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:120) > at > org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) > at > org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) > at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) > at > org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) > at > org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:542) > ... 16 more2020-12-01 13:40:34.983 ERROR > (recoveryExecutor-4-thread-1-processing-n:solr-olxid-statefulset-pull-9.solr-olxid-statefulset-headless.relevance:8983_solr > x:olxid-20200531_d6e431ec_shard2_replica_p3955 c:olxid-20200531_d6e431ec > s:shard2 r:core_node3956) [c:olxid-20200531_d6e431ec s:shard2 r:core_node3956 > x:olxid-20200531_d6e431ec_shard2_replica_p3955] o.a.s.c.RecoveryStrategy > Recovery failed - trying again... (1) > >