Hi Everyone,
matthew sporleder wrote > Are you stuck in iowait during that commit? During commit operation, there is no iowait. Infact most of the time cpu utilization percentage is very low. /*As I mentioned in my previous post that we are getting `SolrCmdDistributor org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server` and `DistributedZkUpdateProcessor` ERROR on one of the shards. And this error is always occurring on the shard that is used (in culr command) to issue commit. (See below example for better understanding)*/ Here is shard and corresponding node details: shard1_0=>solr_199 shard1_1=>solr_200 shard2_0=> solr_254 shard2_1=> solr_132 shard3_0=>solr_133 shard3_1=>solr_198 We are using the following command to issue commit: /curl "http://solr_node:8389/solr/my_collection/update?openSearcher=true&commit=true&wt=json"/ For example, in the above command, if we replace solr_node with solr_254, then it's throwing SolrCmdDistributor and DistributedZkUpdateProcessor errors on shard2_0. Similarly, if we replace solr_node with solr_200 its throws errors on shard1_1. *I'm not able to figure out why this is happening. Is there any connection timeout setting that is affecting this? Is there any limit that, at a time only N number of shards can run commit ops simultaneously or is it some network related issue?* For a better understanding of what's happening in SOLR logs. I will demonstrate here one commit operation. I used the below command to issue commit at `2020-12-06 18:37:40` (approx) curl "http://solr_200:8389/solr/my_collection/update?openSearcher=true&commit=true&wt=json" /*shard2_0 (node: solr_254) Logs:*/ *Commit is received at `2020-12-06 18:37:47` and got over by `2020-12-06 18:37:47` since there were no changes to commit. And CPU utilization during the whole period is around 2%.* 2020-12-06 18:37:47.023 INFO (qtp2034610694-31355) [c:my_collection s:shard2_0 r:core_node13 x:my_collection_shard2_0_replica_n11] o.a.s.u.DirectUpdateHandler2 start commit{_version_=1685355093842460672,optimize=false,ope nSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 2020-12-06 18:37:47.023 INFO No uncommitted changes. Skipping IW.commit. 2020-12-06 18:37:47.023 INFO end_commit_flush 2020-12-06 18:37:47.023 INFO (qtp2034610694-31355) [c:my_collection s:shard2_0 r:core_node13 x:my_collection_shard2_0_replica_n11] o.a.s.u.p.LogUpdateProcessorFactory [my_collection_shard2_0_replica_n11] webapp=/solr path=/update params={update.distrib=TOLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=http://solr_200:8389/solr/my_collection_shard1_1_replica_n19/&commit_end_point=leaders&wt=javabi n&version=2&expungeDeletes=false}{commit=} 0 3 /*shard2_1 (node: solr_132) Logs:*/ *Commit is received at `2020-12-06 18:37:47` and got over by `2020-12-06 18:50:46` in between there were some external file reloading operations (our solr-5.4.2 system is also taking similar time to reload external files so right now this is not a major concern for us) CPU utilization before commit (i.e `2020-12-06 18:37:47` timestamp) is 2% and between commit ops (i.e from `2020-12-06 18:37:47` to `2020-12-06 18:50:46` timestamp) is 14% and after commit operation is done it agains fall back to 2%* 2020-12-06 18:37:47.024 INFO (qtp2034610694-30058) [c:my_collection s:shard2_1 r:core_node22 x:my_collection_shard2_1_replica_n21] o.a.s.u.DirectUpdateHandler2 start commit{_version_=1685355093844557824,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 2020-12-06 18:50:46.218 INFO (qtp2034610694-30058) [c:my_collection s:shard2_1 r:core_node22 x:my_collection_shard2_1_replica_n21] o.a.s.u.p.LogUpdateProcessorFactory [my_collection_shard2_1_replica_n21] webapp=/solr path=/update params={update.distrib=TOLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=http://solr_200:8389/solr/my_collection_shard1_1_replica_n19/&commit_end_point=leaders&wt=javabin&version=2&expungeDeletes=false}{commit=} 0 779196 /*shard3_0 (node: solr_133) logs*/ Same as shard2_1, commit received at `2020-12-06 18:37:47` and got over by `2020-12-06 18:49:24`. CPU utilization pattern is same is shard2_1. /*shard3_1 (node: solr_198) logs.*/ Same as shard2_1, commit received at `2020-12-06 18:37:47` and got over by `2020-12-06 18:53:57`. CPU utilization pattern is same is shard2_1. /*shard1_0 (node: solr_199) logs.*/ Same as shard2_1, commit received at `2020-12-06 18:37:47` and got over by `2020-12-06 18:54:51`. CPU utilization pattern is same is shard2_1. /*shard1_1 (node: solr_200) logs.*/ /This is the same solr_node which we used in curl command to issue commit. As expected we got SolrCmdDistributor and DistributedZkUpdateProcessor error on it./ /Till `2020-12-06 18:46:50` timestamp there is no `start commit` request received. Also CPU utilization is 2% till this time./ /*Received follwing error at `2020-12-06 18:47:47` timestamp*/ 2020-12-06 18:47:47.013 ERROR (updateExecutor-5-thread-6-processing-n:solr_200:8389_solr x:my_collection_shard1_1_replica_n19 c:my_collection s:shard1_1 r:core_node20) [c:my_collection s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr_198:8389/solr/my_collection_shard3_1_replica_n23/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407) at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:753) at org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.request(ConcurrentUpdateHttp2SolrClient.java:369) at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) at org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:344) at org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:333) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.concurrent.TimeoutException at org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216) at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:398) ... 13 more 2020-12-06 18:47:47.018 ERROR (updateExecutor-5-thread-2-processing-n:solr_200:8389_solr x:my_collection_shard1_1_replica_n19 c:my_collection s:shard1_1 r:core_node20) [c:my_collection s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr_199:8389/solr/my_collection_shard1_0_replica_n7/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F ........ *(REST ERROR MESSAGE SAME AS ABOVE)* 2020-12-06 18:47:47.018 ERROR (updateExecutor-5-thread-4-processing-n:solr_200:8389_solr x:my_collection_shard1_1_replica_n19 c:my_collection s:shard1_1 r:core_node20) [c:my_collection s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr_132:8389/solr/my_collection_shard2_1_replica_n21/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F ........ 2020-12-06 18:47:47.018 ERROR (updateExecutor-5-thread-5-processing-n:solr_200:8389_solr x:my_collection_shard1_1_replica_n19 c:my_collection s:shard1_1 r:core_node20) [c:my_collection s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr_133:8389/solr/my_collection_shard3_0_replica_n15/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F ........ 2020-12-06 18:47:47.020 INFO (qtp1267082248-29872) [c:my_collection s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor SolrCmdDistributor found 4 errors /*Then at `2020-12-06 18:47:47` received `start commit` request. Also by this time CPU utilization is moved up to 14%. */ 2020-12-06 18:47:47.020 INFO (qtp1267082248-29872) [c:my_collection s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] o.a.s.u.DirectUpdateHandler2 start commit{_version_=1685355722985963520,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} *THEN IN BW THERE WERE SOME EXTERNAL FILE RELOADING OPERATION. CPU utilization is still around 14%.* *Then at `2020-12-06 19:06:09` timestamp got following warning.* 2020-12-06 19:06:09.672 WARN (qtp1267082248-29872) [c:my_collection s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] o.a.s.u.p.DistributedZkUpdateProcessor Error sending update to http://solr_198:8389/solr => org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr_198:8389/solr/my_collection_shard3_1_replica_n23/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407) org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr_198:8389/solr/my_collection_shard3_1_replica_n23/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407) ~[?:?] at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:753) ~[?:?] at org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.request(ConcurrentUpdateHttp2SolrClient.java:369) ~[?:?] at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) ~[?:?] at org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:344) ~[?:?] at org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:333) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_262] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_262] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_262] at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180) ~[metrics-core-4.1.2.jar:4.1.2] at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_262] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_262] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262] Caused by: java.util.concurrent.TimeoutException at org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216) ~[?:?] at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:398) ~[?:?] ... 13 more 2020-12-06 19:06:09.673 WARN (qtp1267082248-29872) [c:my_collection s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] o.a.s.u.p.DistributedZkUpdateProcessor Error sending update to http://solr_199:8389/solr => org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr_199:8389/solr/........ 2020-12-06 19:06:09.673 WARN (qtp1267082248-29872) [c:my_collection s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] o.a.s.u.p.DistributedZkUpdateProcessor Error sending update to http://solr_132:8389/solr => org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr_132:8389/solr/........ 2020-12-06 19:06:09.673 WARN (qtp1267082248-29872) [c:my_collection s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] o.a.s.u.p.DistributedZkUpdateProcessor Error sending update to http://solr_133:8389/solr => org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr_133:8389/solr/........ /* Then at `2020-12-06 19:06:09` timestamp we got following. Which I'm assuming indicates that commit got over, since after this CPU utilization fall back again to 2%. And We also got curl reponse by this timestamp with status=0. * / 2020-12-06 19:06:09.673 INFO (qtp1267082248-29872) [c:my_collection s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19] o.a.s.u.p.LogUpdateProcessorFactory [my_collection_shard1_1_replica_n19] webapp=/solr path=/update params={openSearcher=true&commit=true&wt=json}{commit=} 0 1702717 In short to summarize: Commit was issued at `2020-12-06 18:37:40` (approx) timestamp. Till `2020-12-06 18:46:50` timestamp there was no `start commit` received and CPU was at 2% till it throws `SolrCmdDistributor` error at `2020-12-06 18:47:47` timestamp. Then at around the same timestamp `2020-12-06 18:47:47` received the `start commit` request. And by this time CPU utilization is moved up to 14%. Then at `2020-12-06 19:06:09` timestamp got `DistributedZkUpdateProcessor` error message and then at `2020-12-06 19:06:09.673` timestamp got some commit related message (which I'm assuming its indicates end of commit) and CPU is again fall back to 2%. Due to these timeout errors overall commit time is getting double. Can someone guide me to resolve this issue. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html