Re: Prevent failback to MASTER after failover

Ming Fang Sun, 31 Mar 2013 17:22:41 -0700

Thanks for this.
For our system we can not use AUTO_REBALANCE since we need to MASTER to be in a 
particular machine.
But I'll try changing the preference list.


I did more debugging today and I think I know what's causing problems are us.
The problem is the sequence of transitions.
1-node_1 = MASTER and node_2 = SLAVE 
2-node_1 killed/dies
3-node_2  transition to MASTER
4-node_1 is restarted
here is the problem
5-node_2 transitions to SLAVE
6-node_1 transitions to SLAVE, yes both node_1 and node_2 are SLAVEs
7-node_1 transitions to MASTER

The problem is in step 5. 
Whenever a node comes up it needs to talk to the MASTER to sync up the state.
But since the node_2 is transition to SLAVE before node_1 becomes SLAVE, 
causing no active MASTER in the cluster.

Is there a way to get step 5 and 6 reversed?


On Mar 31, 2013, at 1:56 AM, kishore g <[email protected]> wrote:

> No you dont have to change the state model to achieve this.
> 
> Intead of AUTO, AUTO_REBALANCE should work in your case. Simply change the 
> ideal state to look like this.
> 
> {
>     "id": "Cluster",
> 
>     "simpleFields":{
> 
>         "IDEAL_STATE_MODE":"AUTO_REBALANCE",
> 
>         "NUM_PARTITIONS": "1",
> 
>         "REPLICAS": "2",
> 
>         "STATE_MODEL_DEF_REF":"MasterSlave"
> 
>     },
>     "mapFields":{
> 
>     },
>     "listFields":{
> 
>         "Partition_0" : [ ]
> 
>     }
> }
> 
> You can also achieve this in AUTO mode: when a node becomes master for a 
> partition, as part of the transition change the preference list in the 
> idealstate. So in this case change "Partition_0" : [ "node_1", "node_2" ] to 
> "Partition_0" : [ "node_2", "node_1" ] when node_2 becomes master.
> 
> In the next release, you will be able to add custom rebalancer code which 
> will allow you to make this change in the controller easily.
> 
> thanks,
> Kishore G
> 
> 
> 
> 
> On Sat, Mar 30, 2013 at 10:21 PM, Ming Fang <[email protected]> wrote:
> Hi Kishore
> 
> Our system requires deterministic placement of the MASTER and SLAVE.
> This is a sample of the idealstate file we're using
> 
> {
>     "id": "Cluster",
> 
>     "simpleFields":{
> 
>         "IDEAL_STATE_MODE":"AUTO",
> 
>         "NUM_PARTITIONS": "1",
> 
>         "REPLICAS": "2",
> 
>         "STATE_MODEL_DEF_REF":"MasterSlave"
> 
>     },
>     "mapFields":{
> 
>     },
>     "listFields":{
> 
>         "Partition_0" : [ "node_1", "node_2" ]
> 
>     }
> }
> 
> 
> In this example, node_1 is the MASTER.
> If node_1 dies then node_2 will take over.
> But if node_1 then get restarted, it will try to become MASTER again.
> We normally keep the died node down to avoid this problem.
> But I was hoping for a more elegant solution.
> 
> One solution would be for node_1 to come up and realizes that node_2 has 
> taken over due to the previous failure.
> In that case node_1 will decide to remain as a SLAVE node instead.
> Should this be done by the Controller instead?
> Should I create a new statemodel other than MASTER/SLAVE?
> 
> On Mar 31, 2013, at 12:50 AM, kishore g <[email protected]> wrote:
> 
>> Hi MIng,
>> 
>> There are couple of ways you can achieve that. Before providing an answer, 
>> how many partitions do you have. Did you generate the idealstate yourself or 
>> used Helix to come up with initial idealstate?
>> 
>> The reason old master tries to become a master again is to distribute the 
>> load among the nodes currently alive. Otherwise the old node that comes back 
>> will never become a master for any partition and will remain idle until 
>> another failure happens in the system.
>> 
>> thanks,
>> Kishore G
>> 
>> 
>> On Sat, Mar 30, 2013 at 8:01 PM, Ming Fang <[email protected]> wrote:
>> We're using MASTER SLAVE in AUTO model.
>> When the MASTER is killed, the failover is working properly as the SLAVE 
>> transitions to become MASTER.
>> However if the failed MASTER is restarted, it will try to become MASTER 
>> again.
>> This is causing a problem in our business logic.
>> Is there a way to prevent the failed instance from becoming MASTER again?
>> 
> 
>

Re: Prevent failback to MASTER after failover

Reply via email to