I created a ticket for that issue: 
https://issues.apache.org/jira/browse/HELIX-276 though now I'm not sure whether 
to qualify it as an improvement or a defect, looks like it is both !

Thanks,

Matthieu



On Oct 22, 2013, at 08:42 , Kanak Biscuitwala <[email protected]> wrote:

> I need to verify this, but I suspect two things are going on, having just 
> taken a quick look at the code:
> 
> 1) I wrote some code a while back to rearrange node preference order if what 
> was calculated did not sufficiently balance the number replicas in state s 
> across nodes. I suspect this code is causing the problem.
> 
> 2) The algorithm's initial assignment ignores preferred placement altogether, 
> and just places everything uniformly by a hash. This is because the algorithm 
> treats all replicas as orphans on the first run. Subsequent rebalances 
> improve the situation as the algorithm never removes preferred replicas from 
> their nodes. I think this should probably be changed so that the preferred 
> replicas are placed first, especially if len(liveNodes) == len(allNodes).
> 
> 3) If nodes are configured and launched at the same time, the preferred 
> placement is not necessarily static, though the hashing scheme is probably 
> flexible enough to allow for this.
> 
> I'll investigate in the morning.
> 
> Date: Mon, 21 Oct 2013 23:21:21 -0700
> Subject: Re: Favoring some transitions when rebalancing in full_auto mode
> From: [email protected]
> To: [email protected]
> 
> Kanak, I thought this should be the default behavior. When the list of 
> participants is generated for each partition, it comprises of 
> 
> preferred participants. i.e if all nodes were up where would this partition 
> reside 
> non preferred participants. i.e when one of preferred participant is down we 
> select a non preferred participant
> If the list we generate ensures that preferred participants are put ahead of 
> non-preferred, the behavior Matthieu is expecting should happen by default 
> without additional changes.
> 
> 
> Am i missing something ?
> 
> 
> 
> On Fri, Oct 18, 2013 at 11:03 AM, Matthieu Morel <[email protected]> wrote:
> Thanks Kanak for the explanation.
> 
> It will definitely be very useful to have a few more knobs for tuning the 
> rebalancing algorithm. I'll post a ticket soon.
> 
> 
> On Oct 18, 2013, at 19:16 , Kanak Biscuitwala <[email protected]> 
> wrote:
> 
> Currently, the FULL_AUTO algorithm does not take this into account. The 
> algorithm optimizes for minimal movement and even distribution of states. 
> What I see here is that there is a tie in terms of even distribution, and 
> current presence of the replica would be a good tiebreaker. I can see why 
> this would be useful, though. Please create an issue and we'll pick it up 
> when we're able.
> 
> On a somewhat related note, I noticed in your example code that you configure 
> and launch your nodes at the same time. The FULL_AUTO rebalancer performs 
> better when you configure your nodes ahead of time (even if you specify more 
> than you actually ever start). This is, of course, optional.
> 
> Thanks for the advice. Currently we expect Helix to recompute states and 
> partitions as nodes join the cluster, though indeed it's probably more 
> efficient to compute some of the schedule ahead of time. I'll see how to 
> apply your suggestion.
> 
> 
> Best regards,
> 
> Matthieu
> 
> 
> Thanks,
> Kanak
> 
> From: Matthieu Morel <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Friday, October 18, 2013 10:03 AM
> To: "[email protected]" <[email protected]>
> Subject: Favoring some transitions when rebalancing in full_auto mode
> 
> Hi,
> 
> In FULL_AUTO mode, helix computes both partitioning and states.
> 
> In a leader-replica model, I observed that when rebalancing due to a failure 
> of the Leader node, Helix does not promote an existing replica to leader, but 
> instead assigns a new leader (I.e. going from offline to replica to leader).
> 
> For quick failover, we need to have the replica promoted to leader instead. 
> Is there a way to do so in FULL_AUTO mode?
> 
> Apparently with SEMI_AUTO that would be possible, but it would imply we 
> control the partitioning, and we'd prefer Helix to control that as well.
> 
> I tried to play with the priorities in the definition of the state model, 
> with no luck so far.
> 
> (See the example below for an example of how rebalancing currently takes 
> place)
> 
> Thanks!
> 
> Matthieu
> 
> 
> Here we have a deployment with 3 nodes, 3 partitions and 2 desired states, 
> Leader and Replica (and offline).
> 
> // initial states
> 
> "mapFields":{
>     "MY_RESOURCE_0":{
>       "instance_1":"REPLICA"
>       ,"instance_2":"LEADER"
>     }
>     ,"MY_RESOURCE_1":{
>       "instance_0":"REPLICA"
>       ,"instance_1":"LEADER"
>     }
>     ,"MY_RESOURCE_2":{
>       "instance_0":"LEADER"
>       ,"instance_2":"REPLICA"  // Instance2 is replica
>     }
>   }
> }
> 
> 
> // instance 0 dies
> 
> "mapFields":{
>     "MY_RESOURCE_0":{
>       "instance_1":"REPLICA"
>       ,"instance_2":"LEADER"
>     }
>     ,"MY_RESOURCE_1":{
>       "instance_1":"LEADER"
>       ,"instance_2":"REPLICA"
>     }
>     ,"MY_RESOURCE_2":{
>       "instance_1":"LEADER" // Helix preferred to assign leadership of 
> resource 2 to instance 1 rather than promoting instance_2 from replica to 
> leader
>       ,"instance_2":"REPLICA" // instance 2 is still replica for resource 2
>     }
>   }
> }

Reply via email to