Re: Getting auto_rebalance right

Matthieu Morel Tue, 15 Oct 2013 08:12:58 -0700

Hi Kishore,

On Oct 15, 2013, at 16:48 , kishore g <[email protected]> wrote:


> Hi Matthieu,
> 
> I think the code avoids placing more than one replica of a partition on the 
> same node. So If you have only 1 node, it will not create only LEADERS. We 
> can make add a configuration and allow this to happen.

Actually, preventing leader and replica for a partition to be on the same node 
makes sense : such a placement defeats the purpose of the replica.

> But I do see something weird when you add a node, only 1 additional replica 
> gets created, that does not make sense. I will take a look at that.

Yes, with 3 nodes and 3 partitions we should expect 3 leaders and 3 replicas. 
An additional requirement, related to the above comment, would be that leaders 
and replica are never colocated. Should I open a jira for that?

Let me know if you need more feedback.

Thanks!

Matthieu



> 
> thanks,
> Kishore G
> 
> 
> On Tue, Oct 15, 2013 at 1:31 AM, Matthieu Morel <[email protected]> wrote:
> Thanks for your prompt answers!
> 
> I used the latest version from the master branch and applied the code changes 
> suggested by Jason.
> 
> The good news are that:
>       - the update was trivial - at least for the small code example I 
> provided. 
>       - I always get 3 leaders states for the 3 partitions
> 
> The bad news are that:
>       - I either don't get enough replica (I want 1 replica for each 
> partition, and initially I only have replica for 2 partitions) 
>       - or simply I get no replica at all (after removing 1 node from the 
> cluster, I have 3 leaders, 0 replica)
> 
> I updated my simple example https://github.com/matthieumorel/helix-balancing 
> so you can reproduce that behavior.
> 
> // with only 1 node, I have 3 leaders, 0 replica :
> 
> Starting instance Node:myhost:10000
> Assigning MY_RESOURCE_1 to Node:myhost:10000
> Assigning MY_RESOURCE_0 to Node:myhost:10000
> Assigning MY_RESOURCE_2 to Node:myhost:10000
> OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_2)
> OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_1)
> OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_0)
> REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_0)
> REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_1)
> REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_2)
> 
> 
> // adding 1 node adds a replica:
> 
> Starting instance Node:myhost:10001
> Assigning MY_RESOURCE_1 to Node:myhost:10001
> OFFLINE -> REPLICA (Node:myhost:10001, MY_RESOURCE_1)
> 
> 
> // adding another node adds a new replica:
> 
> Starting instance Node:myhost:10002
> Assigning MY_RESOURCE_0 to Node:myhost:10002
> OFFLINE -> REPLICA (Node:myhost:10002, MY_RESOURCE_0)
> 
> 
> // removing a node rebalances things but we end up with 3 leaders, 0 replica
> 
> Stopping instance Node:myhost:10000
> Assigning MY_RESOURCE_2 to Node:myhost:10002
> REPLICA -> LEADER (Node:myhost:10002, MY_RESOURCE_0)
> OFFLINE -> REPLICA (Node:myhost:10002, MY_RESOURCE_2)
> REPLICA -> LEADER (Node:myhost:10002, MY_RESOURCE_2)
> REPLICA -> LEADER (Node:myhost:10001, MY_RESOURCE_1)
> 
> 
> I would like to get 1 leader and 1 replica for each partition, regardless of 
> the number of nodes. Is that possible?
> 
> Thanks!
> 
> Matthieu
> 
> 
> 
> On Oct 15, 2013, at 02:30 , Kanak Biscuitwala <[email protected]> wrote:
> 
>> Hi Matthieu,
>> 
>> I have just pushed a patch to the master branch (i.e. trunk) that should fix 
>> the issue. Please let me know if the problem persists.
>> 
>> Thanks,
>> Kanak
>> 
>> ________________________________
>>> From: [email protected] 
>>> To: [email protected] 
>>> Subject: Re: Getting auto_rebalance right 
>>> Date: Mon, 14 Oct 2013 21:32:41 +0000 
>>> 
>>> Hi Matthieu, this is a known bug in 0.6.1 release. We have fixed it in 
>>> trunk. If you are building from trunk, change ClusterConfigInit#init() 
>>> 
>>> admin.addResource(DEFAULT_CLUSTER_NAME, 
>>> RESOURCE, 
>>> PARTITIONS, 
>>> "LEADER_REPLICA", 
>>> IdealStateModeProperty.AUTO_REBALANCE.toString()); 
>>> to 
>>> 
>>> 
>>> admin.addResource(DEFAULT_CLUSTER_NAME, RESOURCE, PARTITIONS, 
>>> 
>>> "LEADER_REPLICA", 
>>> 
>>> RebalanceMode.FULL_AUTO.toString()); 
>>> 
>>> 
>>> It should work. We are planing to make 0.6.2 release with a few fixes 
>>> including this one. 
>>> 
>>> 
>>> Thanks, 
>>> 
>>> Jason 
>>> 
>>> 
>>> From: Matthieu Morel <[email protected]<mailto:[email protected]>> 
>>> Reply-To: 
>>> "[email protected]<mailto:[email protected]>" 
>>> <[email protected]<mailto:[email protected]>> 
>>> Date: Monday, October 14, 2013 12:09 PM 
>>> To: 
>>> "[email protected]<mailto:[email protected]>" 
>>> <[email protected]<mailto:[email protected]>> 
>>> Subject: Getting auto_rebalance right 
>>> 
>>> Hi, 
>>> 
>>> I'm trying to use the auto-rebalance mode in Helix. 
>>> 
>>> The use case is the following (standard leader-standby scenario, a bit 
>>> like the rsync example in the helix codebase): 
>>> - the dataspace is partitioned 
>>> - for a given partition, we have 
>>> - a leader that is responsible for writing and serving data, logging 
>>> operations into a journal 
>>> - a replica that fetches updates from a journal and applies them 
>>> locally but it does not serve data 
>>> Upon failure, the replica becomes leader, applies pending updates and 
>>> can write and serve data. Ideally we also get a new replica assigned. 
>>> 
>>> We'd like to use the auto_rebalance mode in Helix so that partitions 
>>> are automatically assigned and re-assigned, and so that leaders are 
>>> automatically elected. 
>>> 
>>> 
>>> Unfortunately, I can't really get the balancing right. I might be doing 
>>> something wrong, so I uploaded an example here 
>>> : https://github.com/matthieumorel/helix-balancing 
>>> 
>>> 
>>> In this application I would like to get exactly 1 leader and 1 replica 
>>> for each of the partitions 
>>> 
>>> In this example we don't reach that result, and when removing a node, 
>>> we even get to a situation where there is no leader for a given 
>>> partition. 
>>> 
>>> 
>>> Do I have wrong expectations? Is there something wrong with the code, 
>>> is it something with helix? 
>>> 
>>> 
>>> Thanks! 
>>> 
>>> Matthieu                                      
> 
>

Re: Getting auto_rebalance right

Reply via email to