Thanks for your prompt answers!
I used the latest version from the master branch and applied the code changes
suggested by Jason.
The good news are that:
- the update was trivial - at least for the small code example I
provided.
- I always get 3 leaders states for the 3 partitions
The bad news are that:
- I either don't get enough replica (I want 1 replica for each
partition, and initially I only have replica for 2 partitions)
- or simply I get no replica at all (after removing 1 node from the
cluster, I have 3 leaders, 0 replica)
I updated my simple example https://github.com/matthieumorel/helix-balancing so
you can reproduce that behavior.
// with only 1 node, I have 3 leaders, 0 replica :
Starting instance Node:myhost:10000
Assigning MY_RESOURCE_1 to Node:myhost:10000
Assigning MY_RESOURCE_0 to Node:myhost:10000
Assigning MY_RESOURCE_2 to Node:myhost:10000
OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_2)
OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_1)
OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_0)
REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_0)
REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_1)
REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_2)
// adding 1 node adds a replica:
Starting instance Node:myhost:10001
Assigning MY_RESOURCE_1 to Node:myhost:10001
OFFLINE -> REPLICA (Node:myhost:10001, MY_RESOURCE_1)
// adding another node adds a new replica:
Starting instance Node:myhost:10002
Assigning MY_RESOURCE_0 to Node:myhost:10002
OFFLINE -> REPLICA (Node:myhost:10002, MY_RESOURCE_0)
// removing a node rebalances things but we end up with 3 leaders, 0 replica
Stopping instance Node:myhost:10000
Assigning MY_RESOURCE_2 to Node:myhost:10002
REPLICA -> LEADER (Node:myhost:10002, MY_RESOURCE_0)
OFFLINE -> REPLICA (Node:myhost:10002, MY_RESOURCE_2)
REPLICA -> LEADER (Node:myhost:10002, MY_RESOURCE_2)
REPLICA -> LEADER (Node:myhost:10001, MY_RESOURCE_1)
I would like to get 1 leader and 1 replica for each partition, regardless of
the number of nodes. Is that possible?
Thanks!
Matthieu
On Oct 15, 2013, at 02:30 , Kanak Biscuitwala <[email protected]> wrote:
> Hi Matthieu,
>
> I have just pushed a patch to the master branch (i.e. trunk) that should fix
> the issue. Please let me know if the problem persists.
>
> Thanks,
> Kanak
>
> ________________________________
>> From: [email protected]
>> To: [email protected]
>> Subject: Re: Getting auto_rebalance right
>> Date: Mon, 14 Oct 2013 21:32:41 +0000
>>
>> Hi Matthieu, this is a known bug in 0.6.1 release. We have fixed it in
>> trunk. If you are building from trunk, change ClusterConfigInit#init()
>>
>> admin.addResource(DEFAULT_CLUSTER_NAME,
>> RESOURCE,
>> PARTITIONS,
>> "LEADER_REPLICA",
>> IdealStateModeProperty.AUTO_REBALANCE.toString());
>> to
>>
>>
>> admin.addResource(DEFAULT_CLUSTER_NAME, RESOURCE, PARTITIONS,
>>
>> "LEADER_REPLICA",
>>
>> RebalanceMode.FULL_AUTO.toString());
>>
>>
>> It should work. We are planing to make 0.6.2 release with a few fixes
>> including this one.
>>
>>
>> Thanks,
>>
>> Jason
>>
>>
>> From: Matthieu Morel <[email protected]<mailto:[email protected]>>
>> Reply-To:
>> "[email protected]<mailto:[email protected]>"
>> <[email protected]<mailto:[email protected]>>
>> Date: Monday, October 14, 2013 12:09 PM
>> To:
>> "[email protected]<mailto:[email protected]>"
>> <[email protected]<mailto:[email protected]>>
>> Subject: Getting auto_rebalance right
>>
>> Hi,
>>
>> I'm trying to use the auto-rebalance mode in Helix.
>>
>> The use case is the following (standard leader-standby scenario, a bit
>> like the rsync example in the helix codebase):
>> - the dataspace is partitioned
>> - for a given partition, we have
>> - a leader that is responsible for writing and serving data, logging
>> operations into a journal
>> - a replica that fetches updates from a journal and applies them
>> locally but it does not serve data
>> Upon failure, the replica becomes leader, applies pending updates and
>> can write and serve data. Ideally we also get a new replica assigned.
>>
>> We'd like to use the auto_rebalance mode in Helix so that partitions
>> are automatically assigned and re-assigned, and so that leaders are
>> automatically elected.
>>
>>
>> Unfortunately, I can't really get the balancing right. I might be doing
>> something wrong, so I uploaded an example here
>> : https://github.com/matthieumorel/helix-balancing
>>
>>
>> In this application I would like to get exactly 1 leader and 1 replica
>> for each of the partitions
>>
>> In this example we don't reach that result, and when removing a node,
>> we even get to a situation where there is no leader for a given
>> partition.
>>
>>
>> Do I have wrong expectations? Is there something wrong with the code,
>> is it something with helix?
>>
>>
>> Thanks!
>>
>> Matthieu