Re: Getting auto_rebalance right

kishore g Tue, 15 Oct 2013 07:49:13 -0700

Hi Matthieu,

I think the code avoids placing more than one replica of a partition on the
same node. So If you have only 1 node, it will not create only LEADERS. We
can make add a configuration and allow this to happen. But I do see
something weird when you add a node, only 1 additional replica gets
created, that does not make sense. I will take a look at that.


thanks,
Kishore G


On Tue, Oct 15, 2013 at 1:31 AM, Matthieu Morel <[email protected]> wrote:

> Thanks for your prompt answers!
>
> I used the latest version from the master branch and applied the code
> changes suggested by Jason.
>
> The good news are that:
> - the update was trivial - at least for the small code example I provided.
> - I always get 3 leaders states for the 3 partitions
>
> The bad news are that:
> - I either don't get enough replica (I want 1 replica for each partition,
> and initially I only have replica for 2 partitions)
> - or simply I get no replica at all (after removing 1 node from the
> cluster, I have 3 leaders, 0 replica)
>
> I updated my simple example
> https://github.com/matthieumorel/helix-balancing so you can reproduce
> that behavior.
>
> // with only 1 node, I have 3 leaders, 0 replica :
>
> Starting instance Node:myhost:10000
> Assigning MY_RESOURCE_1 to Node:myhost:10000
> Assigning MY_RESOURCE_0 to Node:myhost:10000
> Assigning MY_RESOURCE_2 to Node:myhost:10000
> OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_2)
> OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_1)
> OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_0)
> REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_0)
> REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_1)
> REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_2)
>
>
> // adding 1 node adds a replica:
>
> Starting instance Node:myhost:10001
> Assigning MY_RESOURCE_1 to Node:myhost:10001
> OFFLINE -> REPLICA (Node:myhost:10001, MY_RESOURCE_1)
>
>
> // adding another node adds a new replica:
>
> Starting instance Node:myhost:10002
> Assigning MY_RESOURCE_0 to Node:myhost:10002
> OFFLINE -> REPLICA (Node:myhost:10002, MY_RESOURCE_0)
>
>
> // removing a node rebalances things but we end up with 3 leaders, 0
> replica
>
> Stopping instance Node:myhost:10000
> Assigning MY_RESOURCE_2 to Node:myhost:10002
> REPLICA -> LEADER (Node:myhost:10002, MY_RESOURCE_0)
> OFFLINE -> REPLICA (Node:myhost:10002, MY_RESOURCE_2)
> REPLICA -> LEADER (Node:myhost:10002, MY_RESOURCE_2)
> REPLICA -> LEADER (Node:myhost:10001, MY_RESOURCE_1)
>
>
> I would like to get 1 leader and 1 replica for each partition, regardless
> of the number of nodes. Is that possible?
>
> Thanks!
>
> Matthieu
>
>
>
> On Oct 15, 2013, at 02:30 , Kanak Biscuitwala <[email protected]> wrote:
>
> Hi Matthieu,
>
> I have just pushed a patch to the master branch (i.e. trunk) that should
> fix the issue. Please let me know if the problem persists.
>
> Thanks,
> Kanak
>
> ________________________________
>
> From: [email protected]
> To: [email protected]
> Subject: Re: Getting auto_rebalance right
> Date: Mon, 14 Oct 2013 21:32:41 +0000
>
> Hi Matthieu, this is a known bug in 0.6.1 release. We have fixed it in
> trunk. If you are building from trunk, change ClusterConfigInit#init()
>
> admin.addResource(DEFAULT_CLUSTER_NAME,
> RESOURCE,
> PARTITIONS,
> "LEADER_REPLICA",
> IdealStateModeProperty.AUTO_REBALANCE.toString());
> to
>
>
> admin.addResource(DEFAULT_CLUSTER_NAME, RESOURCE, PARTITIONS,
>
> "LEADER_REPLICA",
>
> RebalanceMode.FULL_AUTO.toString());
>
>
> It should work. We are planing to make 0.6.2 release with a few fixes
> including this one.
>
>
> Thanks,
>
> Jason
>
>
> From: Matthieu Morel 
> <[email protected]<mailto:[email protected]<[email protected]>>>
>
> Reply-To:
> "[email protected]<mailto:[email protected]<[email protected]>>"
>
> <[email protected]<mailto:[email protected]<[email protected]>>>
>
> Date: Monday, October 14, 2013 12:09 PM
> To:
> "[email protected]<mailto:[email protected]<[email protected]>>"
>
> <[email protected]<mailto:[email protected]<[email protected]>>>
>
> Subject: Getting auto_rebalance right
>
> Hi,
>
> I'm trying to use the auto-rebalance mode in Helix.
>
> The use case is the following (standard leader-standby scenario, a bit
> like the rsync example in the helix codebase):
> - the dataspace is partitioned
> - for a given partition, we have
> - a leader that is responsible for writing and serving data, logging
> operations into a journal
> - a replica that fetches updates from a journal and applies them
> locally but it does not serve data
> Upon failure, the replica becomes leader, applies pending updates and
> can write and serve data. Ideally we also get a new replica assigned.
>
> We'd like to use the auto_rebalance mode in Helix so that partitions
> are automatically assigned and re-assigned, and so that leaders are
> automatically elected.
>
>
> Unfortunately, I can't really get the balancing right. I might be doing
> something wrong, so I uploaded an example here
> : https://github.com/matthieumorel/helix-balancing
>
>
> In this application I would like to get exactly 1 leader and 1 replica
> for each of the partitions
>
> In this example we don't reach that result, and when removing a node,
> we even get to a situation where there is no leader for a given
> partition.
>
>
> Do I have wrong expectations? Is there something wrong with the code,
> is it something with helix?
>
>
> Thanks!
>
> Matthieu
>
>
>

Re: Getting auto_rebalance right

Reply via email to