Hi Kishore, On Oct 15, 2013, at 16:48 , kishore g <[email protected]> wrote:
> Hi Matthieu, > > I think the code avoids placing more than one replica of a partition on the > same node. So If you have only 1 node, it will not create only LEADERS. We > can make add a configuration and allow this to happen. Actually, preventing leader and replica for a partition to be on the same node makes sense : such a placement defeats the purpose of the replica. > But I do see something weird when you add a node, only 1 additional replica > gets created, that does not make sense. I will take a look at that. Yes, with 3 nodes and 3 partitions we should expect 3 leaders and 3 replicas. An additional requirement, related to the above comment, would be that leaders and replica are never colocated. Should I open a jira for that? Let me know if you need more feedback. Thanks! Matthieu > > thanks, > Kishore G > > > On Tue, Oct 15, 2013 at 1:31 AM, Matthieu Morel <[email protected]> wrote: > Thanks for your prompt answers! > > I used the latest version from the master branch and applied the code changes > suggested by Jason. > > The good news are that: > - the update was trivial - at least for the small code example I > provided. > - I always get 3 leaders states for the 3 partitions > > The bad news are that: > - I either don't get enough replica (I want 1 replica for each > partition, and initially I only have replica for 2 partitions) > - or simply I get no replica at all (after removing 1 node from the > cluster, I have 3 leaders, 0 replica) > > I updated my simple example https://github.com/matthieumorel/helix-balancing > so you can reproduce that behavior. > > // with only 1 node, I have 3 leaders, 0 replica : > > Starting instance Node:myhost:10000 > Assigning MY_RESOURCE_1 to Node:myhost:10000 > Assigning MY_RESOURCE_0 to Node:myhost:10000 > Assigning MY_RESOURCE_2 to Node:myhost:10000 > OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_2) > OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_1) > OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_0) > REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_0) > REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_1) > REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_2) > > > // adding 1 node adds a replica: > > Starting instance Node:myhost:10001 > Assigning MY_RESOURCE_1 to Node:myhost:10001 > OFFLINE -> REPLICA (Node:myhost:10001, MY_RESOURCE_1) > > > // adding another node adds a new replica: > > Starting instance Node:myhost:10002 > Assigning MY_RESOURCE_0 to Node:myhost:10002 > OFFLINE -> REPLICA (Node:myhost:10002, MY_RESOURCE_0) > > > // removing a node rebalances things but we end up with 3 leaders, 0 replica > > Stopping instance Node:myhost:10000 > Assigning MY_RESOURCE_2 to Node:myhost:10002 > REPLICA -> LEADER (Node:myhost:10002, MY_RESOURCE_0) > OFFLINE -> REPLICA (Node:myhost:10002, MY_RESOURCE_2) > REPLICA -> LEADER (Node:myhost:10002, MY_RESOURCE_2) > REPLICA -> LEADER (Node:myhost:10001, MY_RESOURCE_1) > > > I would like to get 1 leader and 1 replica for each partition, regardless of > the number of nodes. Is that possible? > > Thanks! > > Matthieu > > > > On Oct 15, 2013, at 02:30 , Kanak Biscuitwala <[email protected]> wrote: > >> Hi Matthieu, >> >> I have just pushed a patch to the master branch (i.e. trunk) that should fix >> the issue. Please let me know if the problem persists. >> >> Thanks, >> Kanak >> >> ________________________________ >>> From: [email protected] >>> To: [email protected] >>> Subject: Re: Getting auto_rebalance right >>> Date: Mon, 14 Oct 2013 21:32:41 +0000 >>> >>> Hi Matthieu, this is a known bug in 0.6.1 release. We have fixed it in >>> trunk. If you are building from trunk, change ClusterConfigInit#init() >>> >>> admin.addResource(DEFAULT_CLUSTER_NAME, >>> RESOURCE, >>> PARTITIONS, >>> "LEADER_REPLICA", >>> IdealStateModeProperty.AUTO_REBALANCE.toString()); >>> to >>> >>> >>> admin.addResource(DEFAULT_CLUSTER_NAME, RESOURCE, PARTITIONS, >>> >>> "LEADER_REPLICA", >>> >>> RebalanceMode.FULL_AUTO.toString()); >>> >>> >>> It should work. We are planing to make 0.6.2 release with a few fixes >>> including this one. >>> >>> >>> Thanks, >>> >>> Jason >>> >>> >>> From: Matthieu Morel <[email protected]<mailto:[email protected]>> >>> Reply-To: >>> "[email protected]<mailto:[email protected]>" >>> <[email protected]<mailto:[email protected]>> >>> Date: Monday, October 14, 2013 12:09 PM >>> To: >>> "[email protected]<mailto:[email protected]>" >>> <[email protected]<mailto:[email protected]>> >>> Subject: Getting auto_rebalance right >>> >>> Hi, >>> >>> I'm trying to use the auto-rebalance mode in Helix. >>> >>> The use case is the following (standard leader-standby scenario, a bit >>> like the rsync example in the helix codebase): >>> - the dataspace is partitioned >>> - for a given partition, we have >>> - a leader that is responsible for writing and serving data, logging >>> operations into a journal >>> - a replica that fetches updates from a journal and applies them >>> locally but it does not serve data >>> Upon failure, the replica becomes leader, applies pending updates and >>> can write and serve data. Ideally we also get a new replica assigned. >>> >>> We'd like to use the auto_rebalance mode in Helix so that partitions >>> are automatically assigned and re-assigned, and so that leaders are >>> automatically elected. >>> >>> >>> Unfortunately, I can't really get the balancing right. I might be doing >>> something wrong, so I uploaded an example here >>> : https://github.com/matthieumorel/helix-balancing >>> >>> >>> In this application I would like to get exactly 1 leader and 1 replica >>> for each of the partitions >>> >>> In this example we don't reach that result, and when removing a node, >>> we even get to a situation where there is no leader for a given >>> partition. >>> >>> >>> Do I have wrong expectations? Is there something wrong with the code, >>> is it something with helix? >>> >>> >>> Thanks! >>> >>> Matthieu > >
