We have 2 shard and 2 replicas in production server.Somehow replica1 became 
leader when some commit process was running in shard1.
Log ::

***shard1***
2019-04-08 12:52:09.930 INFO  
(searcherExecutor-30-thread-1-processing-n:shard1:8983_solr x:productData 
s:shard1 c:productData r:core_node1) [c:productData s:shard1 r:core_node1 
x:productData] o.a.s.c.QuerySenderListener QuerySenderListener done.
2019-04-08 12:54:01.397 INFO  (qtp1239731077-1359101) [c:product s:shard1 
r:core_node1 x:product] o.a.s.u.p.LogUpdateProcessorFactory [product]  
webapp=/solr path=/update params={wt=javabin&version=2}{add=[PRO23241768 
(1630250393598427136)]} 0 111711

***replica1***
2019-04-08 12:52:09.581 INFO  (qtp1239731077-1021605) [c:product s:shard1 
r:core_node3 x:product] o.a.s.u.p.LogUpdateProcessorFactory [product]  
webapp=/solr path=/update 
params={update.distrib=FROMLEADER&distrib.from=shard1:8983/solr/product/&wt=javabin&version=2}{add=[PRO23241768
 (1630250393598427136)]} 0 0
2019-04-08 12:52:19.717 INFO  
(zkCallback-4-thread-207-processing-n:replica1:8983_solr) [   ] 
o.a.s.c.c.ZkStateReader A live node change: [WatchedEvent state:SyncConnected 
type:NodeChildrenChanged path:/live_nodes], has occurred - updating... (live 
nodes size: [4])

PRO23241768 was successfully updated at time 12:52:09.581 in replica1 but 
updated time was 12:54:01.397 in shard1. It took around 1.86(111711) minutes. 
In between replica1 tried to become a leader at time 12:52:19.717 and it became 
successfully.

My production solr.xml
<int name="zkClientTimeout">${zkClientTimeout:600000}</int>
<int name="distribUpdateSoTimeout">${distribUpdateSoTimeout:600000}</int>
<int name="distribUpdateConnTimeout">${distribUpdateConnTimeout:60000}</int>

<shardHandlerFactory name="shardHandlerFactory" class="HttpShardHandlerFactory">
 <int name="socketTimeout">${socketTimeout:600000}</int>
 <int name="connTimeout">${connTimeout:60000}</int>
</shardHandlerFactory>

Collection : product and productData.

Version ::
solr  : 6.1.0
Zoo keeper : 3.4.6


Why did shard1 take a 1.8 minutes time for update? and if it took time for 
update then why did replica1 try to become leader? Is it required to update any 
timeout?

Note : PRO23241768 was soft commit and log was info level.

Reply via email to