SolrCloud uses ZooKeeper sequence flags to keep track of the order in which nodes register themselves as leader candidates. The node with the lowest sequence number wins as leader of the shard.
What I'm trying to do is to keep the leader re-assignments to the minimum during a rolling restart. In this direction I change the zk sequence numbers on the SolrCloud nodes when all nodes of the cluster are up and active. I'm using Solr 4.10.0 and I'm aware of SOLR-6491 which has a similar purpose but I'm trying to do it from "outside", using the existing APIs without editing Solr source code. == TYPICAL SCENARIO == Suppose we have 3 Solr instances S1,S2,S3. They are started in the same order and the zk sequences assigned have as follows S1:-n_0000000000 (LEADER) S2:-n_0000000001 S3:-n_0000000002 In a rolling restart we'll get S2 as leader (after S1 shutdown), then S3 (after S2 shutdown) and finally S1(after S3 shutdown), 3 changes in total. == MY ATTEMPT == By using SolrZkClient and the Zookeeper multi API I found a way to get rid of the old zknodes that participate in a shard's leader election and write new ones where we can assign the sequence number of our liking. S1:-n_0000000000 (no code running here) S2:-n_0000000004 (code deleting zknode -n_0000000001 and creating -n_0000000004) S3:-n_0000000003 (code deleting zknode -n_0000000002 and creating -n_0000000003) In a rolling restart I'd expect to have S3 as leader (after S1 shutdown), no change (after S2 shutdown) and finally S1(after S3 shutdown), that is 2 changes. This will be constant no matter how many servers are added in SolrCloud while in the first scenarion the # of re-assignments equals the # of Solr servers. The problem occurs when S1 (LEADER) is shut down. The elections that take place still set S2 as leader, It's like ignoring the new sequence numbers. When I go to /solr/#/~cloud?view=tree the new sequence numbers are listed under "/collections" based on which S3 should have become the leader. Do you have any idea why the new state is not acknowledged during the elections? Is something cached? Or to put it bluntly do I have any chance down this path? If not what are my options? Is it possible to apply all patches under SOLR-6491 in isolation and continue from there? Thank you. Extra info which might help follows 1. Some logging related to leader elections after S1 has been shut down S2 - org.apache.solr.cloud.SyncStrategy Leader's attempt to sync with shard failed, moving to the next candidate S2 - org.apache.solr.cloud.ShardLeaderElectionContext We failed sync, but we have no versions - we can't sync in that case - we were active before, so become leader anyway S3 - org.apache.solr.cloud.LeaderElector Our node is no longer in line to be leader 2. And some sample code on how I perform the ZK re-sequencing // Read current zk nodes for a specific collection solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().getChildren("/collections/core/leader_elect/shard1 /election", true) // node deletion Op.delete(path, -1) // node creation Op.create(createPath, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL); // Perform operations solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().multi(opsList); solrServer.getZkStateReader().updateClusterState(true); -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-shard-leader-elections-Altering-zookeeper-sequence-numbers-tp4178973.html Sent from the Solr - User mailing list archive at Nabble.com.