Hi Everyone,

matthew sporleder wrote
> Are you stuck in iowait during that commit?

During commit operation, there is no iowait.
Infact most of the time cpu utilization percentage is very low.

/*As I mentioned in my previous post that we are getting `SolrCmdDistributor
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server` and `DistributedZkUpdateProcessor` ERROR on
one of the shards. And this error is always occurring on the shard that is
used (in culr command) to issue commit. (See below example for better
understanding)*/

Here is shard and corresponding node details:
shard1_0=>solr_199
shard1_1=>solr_200
shard2_0=> solr_254
shard2_1=> solr_132
shard3_0=>solr_133
shard3_1=>solr_198

We are using the following command to issue commit:
/curl
"http://solr_node:8389/solr/my_collection/update?openSearcher=true&commit=true&wt=json"/

For example, in the above command, if we replace solr_node with solr_254,
then it's throwing SolrCmdDistributor and DistributedZkUpdateProcessor
errors on shard2_0. Similarly, if we replace solr_node with solr_200 its
throws errors on shard1_1.

*I'm not able to figure out why this is happening. Is there any connection
timeout setting that is affecting this? Is there any limit that, at a time
only N number of shards can run commit ops simultaneously or is it some
network related issue?*


For a better understanding of what's happening in SOLR logs. I will
demonstrate here one commit operation.

I used the below command to issue commit at `2020-12-06 18:37:40` (approx)
curl
"http://solr_200:8389/solr/my_collection/update?openSearcher=true&commit=true&wt=json";


/*shard2_0 (node: solr_254) Logs:*/


*Commit is received at `2020-12-06 18:37:47` and got over by `2020-12-06
18:37:47` since there were no changes to commit. And CPU utilization during
the whole period is around 2%.*


2020-12-06 18:37:47.023 INFO  (qtp2034610694-31355) [c:my_collection
s:shard2_0 r:core_node13 x:my_collection_shard2_0_replica_n11]
o.a.s.u.DirectUpdateHandler2 start
commit{_version_=1685355093842460672,optimize=false,ope
nSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2020-12-06 18:37:47.023 INFO No uncommitted changes. Skipping IW.commit.
2020-12-06 18:37:47.023 INFO end_commit_flush
2020-12-06 18:37:47.023 INFO  (qtp2034610694-31355) [c:my_collection
s:shard2_0 r:core_node13 x:my_collection_shard2_0_replica_n11]
o.a.s.u.p.LogUpdateProcessorFactory [my_collection_shard2_0_replica_n11] 
webapp=/solr path=/update

params={update.distrib=TOLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=http://solr_200:8389/solr/my_collection_shard1_1_replica_n19/&commit_end_point=leaders&wt=javabi
n&version=2&expungeDeletes=false}{commit=} 0 3

/*shard2_1 (node: solr_132) Logs:*/

*Commit is received at `2020-12-06 18:37:47` and got over by `2020-12-06
18:50:46` in between there were some external file reloading operations (our
solr-5.4.2 system is also taking similar time to reload external files so
right now this is not a major concern for us)
CPU utilization before commit (i.e `2020-12-06 18:37:47` timestamp) is 2%
and between commit ops (i.e from `2020-12-06 18:37:47`  to `2020-12-06
18:50:46` timestamp) is 14% and after commit operation is done it agains
fall back to 2%*


2020-12-06 18:37:47.024 INFO  (qtp2034610694-30058) [c:my_collection
s:shard2_1 r:core_node22 x:my_collection_shard2_1_replica_n21]
o.a.s.u.DirectUpdateHandler2 start
commit{_version_=1685355093844557824,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

2020-12-06 18:50:46.218 INFO  (qtp2034610694-30058) [c:my_collection
s:shard2_1 r:core_node22 x:my_collection_shard2_1_replica_n21]
o.a.s.u.p.LogUpdateProcessorFactory [my_collection_shard2_1_replica_n21] 
webapp=/solr path=/update
params={update.distrib=TOLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=http://solr_200:8389/solr/my_collection_shard1_1_replica_n19/&commit_end_point=leaders&wt=javabin&version=2&expungeDeletes=false}{commit=}
0 779196


/*shard3_0 (node: solr_133) logs*/

Same as shard2_1, commit received at `2020-12-06 18:37:47` and got over by
`2020-12-06 18:49:24`.
CPU utilization pattern is same is shard2_1.

/*shard3_1 (node: solr_198) logs.*/

Same as shard2_1, commit received at `2020-12-06 18:37:47` and got over by
`2020-12-06 18:53:57`.
CPU utilization pattern is same is shard2_1.

/*shard1_0 (node: solr_199) logs.*/

Same as shard2_1, commit received at `2020-12-06 18:37:47` and got over by
`2020-12-06 18:54:51`.
CPU utilization pattern is same is shard2_1.

/*shard1_1 (node: solr_200) logs.*/

/This is the same solr_node which we used in curl command to issue commit.
As expected we got SolrCmdDistributor and DistributedZkUpdateProcessor error
on it./

/Till `2020-12-06 18:46:50` timestamp there is no `start commit`  request
received. Also CPU utilization is 2% till this time./
/*Received follwing error at `2020-12-06 18:47:47` timestamp*/

2020-12-06 18:47:47.013 ERROR
(updateExecutor-5-thread-6-processing-n:solr_200:8389_solr
x:my_collection_shard1_1_replica_n19 c:my_collection s:shard1_1
r:core_node20) [c:my_collection s:shard1_1 r:core_node20
x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at:
http://solr_198:8389/solr/my_collection_shard3_1_replica_n23/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F
        at
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407)
        at
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:753)
        at
org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.request(ConcurrentUpdateHttp2SolrClient.java:369)
        at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290)
        at
org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:344)
        at
org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:333)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)
        at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
        at
org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
        at
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:398)
        ... 13 more

2020-12-06 18:47:47.018 ERROR
(updateExecutor-5-thread-2-processing-n:solr_200:8389_solr
x:my_collection_shard1_1_replica_n19 c:my_collection s:shard1_1
r:core_node20) [c:my_collection s:shard1_1 r:core_node20
x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at:
http://solr_199:8389/solr/my_collection_shard1_0_replica_n7/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F
........ 
*(REST ERROR MESSAGE SAME AS ABOVE)*



2020-12-06 18:47:47.018 ERROR
(updateExecutor-5-thread-4-processing-n:solr_200:8389_solr
x:my_collection_shard1_1_replica_n19 c:my_collection s:shard1_1
r:core_node20) [c:my_collection s:shard1_1 r:core_node20
x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at:
http://solr_132:8389/solr/my_collection_shard2_1_replica_n21/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F
........


2020-12-06 18:47:47.018 ERROR
(updateExecutor-5-thread-5-processing-n:solr_200:8389_solr
x:my_collection_shard1_1_replica_n19 c:my_collection s:shard1_1
r:core_node20) [c:my_collection s:shard1_1 r:core_node20
x:my_collection_shard1_1_replica_n19] o.a.s.u.SolrCmdDistributor
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at:
http://solr_133:8389/solr/my_collection_shard3_0_replica_n15/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F
........

2020-12-06 18:47:47.020 INFO  (qtp1267082248-29872) [c:my_collection
s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19]
o.a.s.u.SolrCmdDistributor SolrCmdDistributor found 4 errors


/*Then at `2020-12-06 18:47:47` received `start commit` request. Also by
this time CPU utilization is moved up to 14%. */


2020-12-06 18:47:47.020 INFO  (qtp1267082248-29872) [c:my_collection
s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19]
o.a.s.u.DirectUpdateHandler2 start
commit{_version_=1685355722985963520,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}


*THEN IN BW THERE WERE SOME EXTERNAL FILE RELOADING OPERATION. CPU
utilization is still around 14%.*

*Then at `2020-12-06 19:06:09` timestamp got following warning.*

2020-12-06 19:06:09.672 WARN  (qtp1267082248-29872) [c:my_collection
s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19]
o.a.s.u.p.DistributedZkUpdateProcessor Error sending update to
http://solr_198:8389/solr =>
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at:
http://solr_198:8389/solr/my_collection_shard3_1_replica_n23/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F
        at
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407)
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at:
http://solr_198:8389/solr/my_collection_shard3_1_replica_n23/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_200%3A8389%2Fsolr%2Fmy_collection_shard1_1_replica_n19%2F
        at
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407)
~[?:?]
        at
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:753)
~[?:?]
        at
org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.request(ConcurrentUpdateHttp2SolrClient.java:369)
~[?:?]
        at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290)
~[?:?]
        at
org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:344)
~[?:?]
        at
org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:333)
~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_262]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[?:1.8.0_262]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_262]
        at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)
~[metrics-core-4.1.2.jar:4.1.2]
        at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
~[?:?]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_262]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
~[?:1.8.0_262]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
Caused by: java.util.concurrent.TimeoutException
        at
org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
~[?:?]
        at
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:398)
~[?:?]
        ... 13 more


2020-12-06 19:06:09.673 WARN  (qtp1267082248-29872) [c:my_collection
s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19]
o.a.s.u.p.DistributedZkUpdateProcessor Error sending update to
http://solr_199:8389/solr =>
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at: http://solr_199:8389/solr/........

2020-12-06 19:06:09.673 WARN  (qtp1267082248-29872) [c:my_collection
s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19]
o.a.s.u.p.DistributedZkUpdateProcessor Error sending update to
http://solr_132:8389/solr =>
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at: http://solr_132:8389/solr/........


2020-12-06 19:06:09.673 WARN  (qtp1267082248-29872) [c:my_collection
s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19]
o.a.s.u.p.DistributedZkUpdateProcessor Error sending update to
http://solr_133:8389/solr =>
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at: http://solr_133:8389/solr/........

/*
Then at `2020-12-06 19:06:09` timestamp we got following. Which I'm assuming
indicates that commit got over, since after this CPU utilization fall back
again to 2%. And We also got curl reponse by this timestamp with status=0. *
/


2020-12-06 19:06:09.673 INFO  (qtp1267082248-29872) [c:my_collection
s:shard1_1 r:core_node20 x:my_collection_shard1_1_replica_n19]
o.a.s.u.p.LogUpdateProcessorFactory [my_collection_shard1_1_replica_n19] 
webapp=/solr path=/update
params={openSearcher=true&commit=true&wt=json}{commit=} 0 1702717


In short to summarize:
Commit was issued at `2020-12-06 18:37:40` (approx) timestamp.
Till `2020-12-06 18:46:50` timestamp there was no `start commit` received
and CPU was at  2% till it throws `SolrCmdDistributor` error at `2020-12-06
18:47:47` timestamp. Then at around the same timestamp `2020-12-06 18:47:47`
received the `start commit` request. And by this time CPU utilization is
moved up to 14%. 
Then at `2020-12-06 19:06:09` timestamp got `DistributedZkUpdateProcessor`
error message and then at `2020-12-06 19:06:09.673` timestamp got some
commit related message (which I'm assuming its indicates end of commit) and
CPU is again fall back to 2%.


Due to these timeout errors overall commit time is getting double. Can
someone guide me to resolve this issue. 










--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to