Hello,
         Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
hours of debugging spree!! Can somebody kindly help me  out of this misery.

I have a set has 8 single shard collections with 3 replicas. As soon as I
updated the configs and started the servers one of my collection got stuck
with no leader. I have restarted solr to no avail, I also tried to force a
leader via collections API that dint work either. I also see that, from
time to time multiple solr nodes go down all at the same time, only a
restart resolves the issue.

The error snippets are shown below

2017-02-02 01:43:42.785 ERROR
(recoveryExecutor-3-thread-6-processing-n:10.128.159.245:9001_solr
x:clicktrack_shard1_replica1 s:shard1 c:clicktrack r:core_node1)
[c:clicktrack s:shard1 r:core_node1 x:clicktrack_shard1_replica1]
o.a.s.c.RecoveryStrategy Error while trying to recover.
core=clicktrack_shard1_replica1:org.apache.solr.common.SolrException: No
registered leader was found after waiting for 4000ms , collection:
clicktrack slice: shard1

solr.log.9:2017-02-02 01:43:41.336 INFO
(zkCallback-4-thread-29-processing-n:10.128.159.245:9001_solr) [   ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/collections/clicktrack/state.json] for collection [clicktrack] has
occurred - updating... (live nodes size: [1])
solr.log.9:2017-02-02 01:43:42.224 INFO
(zkCallback-4-thread-29-processing-n:10.128.159.245:9001_solr) [   ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/collections/clicktrack/state.json] for collection [clicktrack] has
occurred - updating... (live nodes size: [1])
solr.log.9:2017-02-02 01:43:43.767 INFO
(zkCallback-4-thread-23-processing-n:10.128.159.245:9001_solr) [   ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/collections/clicktrack/state.json] for collection [clicktrack] has
occurred - updating... (live nodes size: [1])


Suspecting the worst I backed up the index and renamed the collection's
data folder and restarted the servers, this time the collection got a
proper leader. So is my index really corrupted ? Solr UI showed live nodes
just like the logs but without any leader. Even with the leader issue
somewhat alleviated after renaming the data folder and letting silr create
a new data folder my servers did go down a couple of times.

I am not all that well versed with zookeeper...any trick to make zookeeper
pick a leader and be happy ? Did anybody have solr/zookeeper issues with
6.4.0 ?

Thanks

Ravi Kiran Bhaskar

Reply via email to