Just skimming, but the problem here that I ran into was with the listeners. Each _Solr_ instance out there is listening to one of the ephemeral nodes (the "one in front"). So deleting a node does _not_ change which ephemeral node the associated Solr instance is listening to.
So, for instance, when you delete S2..n-000001 and re-add it, S2 is still looking at S1....n-000000 and will continue looking at S1...n-000000 until S1....n-000000 is deleted. Deleting S2..n-000001 will wake up S3 though, which should now be looking at S1....n-0000000. Now you have two Solr listeners looking at the same ephemeral node. The key is that deleting S2...n-000001 does _not_ wake up S2, just any solr instance that has a watch on the associated ephemeral node. The code you want is in LeaderElector.checkIfIamLeader to understand how it all works. Be aware that the sortSeqs call sorts the nodes by 1> sequence number 2> string comparison. Which has the unfortunate characteristic of a secondary sort by session ID. So two nodes with the same sequence number can sort before or after each other depending on which one gets a session higher/lower than the other. This is quite tricky to get right, I once created a patch for 4.10.3 by applying things in this order (some minor tweaks required). All SOLR- 6115 6512 6577 6513 6517 6670 6691 Good luck! Erick On Mon, Jan 12, 2015 at 8:54 AM, Zisis Tachtsidis <zist...@runbox.com> wrote: > SolrCloud uses ZooKeeper sequence flags to keep track of the order in which > nodes register themselves as leader candidates. The node with the lowest > sequence number wins as leader of the shard. > > What I'm trying to do is to keep the leader re-assignments to the minimum > during a rolling restart. In this direction I change the zk sequence numbers > on the SolrCloud nodes when all nodes of the cluster are up and active. I'm > using Solr 4.10.0 and I'm aware of SOLR-6491 which has a similar purpose but > I'm trying to do it from "outside", using the existing APIs without editing > Solr source code. > > == TYPICAL SCENARIO == > Suppose we have 3 Solr instances S1,S2,S3. They are started in the same > order and the zk sequences assigned have as follows > S1:-n_0000000000 (LEADER) > S2:-n_0000000001 > S3:-n_0000000002 > > In a rolling restart we'll get S2 as leader (after S1 shutdown), then S3 > (after S2 shutdown) and finally S1(after S3 shutdown), 3 changes in total. > > == MY ATTEMPT == > By using SolrZkClient and the Zookeeper multi API I found a way to get rid > of the old zknodes that participate in a shard's leader election and write > new ones where we can assign the sequence number of our liking. > > S1:-n_0000000000 (no code running here) > S2:-n_0000000004 (code deleting zknode -n_0000000001 and creating > -n_0000000004) > S3:-n_0000000003 (code deleting zknode -n_0000000002 and creating > -n_0000000003) > > In a rolling restart I'd expect to have S3 as leader (after S1 shutdown), no > change (after S2 shutdown) and finally S1(after S3 shutdown), that is 2 > changes. This will be constant no matter how many servers are added in > SolrCloud while in the first scenarion the # of re-assignments equals the # > of Solr servers. > > The problem occurs when S1 (LEADER) is shut down. The elections that take > place still set S2 as leader, It's like ignoring the new sequence numbers. > When I go to /solr/#/~cloud?view=tree the new sequence numbers are listed > under "/collections" based on which S3 should have become the leader. > Do you have any idea why the new state is not acknowledged during the > elections? Is something cached? Or to put it bluntly do I have any chance > down this path? If not what are my options? Is it possible to apply all > patches under SOLR-6491 in isolation and continue from there? > > Thank you. > > Extra info which might help follows > 1. Some logging related to leader elections after S1 has been shut down > S2 - org.apache.solr.cloud.SyncStrategy Leader's attempt to sync with > shard failed, moving to the next candidate > S2 - org.apache.solr.cloud.ShardLeaderElectionContext We failed sync, > but we have no versions - we can't sync in that > case - we were active before, so become leader anyway > > S3 - org.apache.solr.cloud.LeaderElector Our node is no longer in line > to be leader > > 2. And some sample code on how I perform the ZK re-sequencing > // Read current zk nodes for a specific collection > > solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().getChildren("/collections/core/leader_elect/shard1 > /election", true) > // node deletion > Op.delete(path, -1) > // node creation > Op.create(createPath, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, > CreateMode.EPHEMERAL_SEQUENTIAL); > // Perform operations > > solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().multi(opsList); > solrServer.getZkStateReader().updateClusterState(true); > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SolrCloud-shard-leader-elections-Altering-zookeeper-sequence-numbers-tp4178973.html > Sent from the Solr - User mailing list archive at Nabble.com.