Re: Getting auto_rebalance right

Kanak Biscuitwala Tue, 15 Oct 2013 10:40:08 -0700

Hi Matthieu,

Please change line 39 in ClusterConfigInit to:


admin.rebalance(DEFAULT_CLUSTER_NAME, RESOURCE, 2);

Basically, the leader counts as a replica, so if you want a replica in addition 
to the leader, you need to specify 2 for the replica count.

There is a bug where when there are 3 nodes, I see partition 0 has 2 in REPLICA 
state even though one of them should be dropped. I'll keep investigating that.

Kanak

From: Matthieu Morel <[email protected]<mailto:[email protected]>>
Reply-To: 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, October 15, 2013 8:06 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Getting auto_rebalance right

Hi Kishore,

On Oct 15, 2013, at 16:48 , kishore g 
<[email protected]<mailto:[email protected]>> wrote:

Hi Matthieu,

I think the code avoids placing more than one replica of a partition on the 
same node. So If you have only 1 node, it will not create only LEADERS. We can 
make add a configuration and allow this to happen.

Actually, preventing leader and replica for a partition to be on the same node 
makes sense : such a placement defeats the purpose of the replica.

But I do see something weird when you add a node, only 1 additional replica 
gets created, that does not make sense. I will take a look at that.

Yes, with 3 nodes and 3 partitions we should expect 3 leaders and 3 replicas. 
An additional requirement, related to the above comment, would be that leaders 
and replica are never colocated. Should I open a jira for that?

Let me know if you need more feedback.

Thanks!

Matthieu




thanks,
Kishore G


On Tue, Oct 15, 2013 at 1:31 AM, Matthieu Morel 
<[email protected]<mailto:[email protected]>> wrote:
Thanks for your prompt answers!

I used the latest version from the master branch and applied the code changes 
suggested by Jason.

The good news are that:
- the update was trivial - at least for the small code example I provided.
- I always get 3 leaders states for the 3 partitions

The bad news are that:
- I either don't get enough replica (I want 1 replica for each partition, and 
initially I only have replica for 2 partitions)
- or simply I get no replica at all (after removing 1 node from the cluster, I 
have 3 leaders, 0 replica)

I updated my simple example https://github.com/matthieumorel/helix-balancing so 
you can reproduce that behavior.

// with only 1 node, I have 3 leaders, 0 replica :

Starting instance Node:myhost:10000
Assigning MY_RESOURCE_1 to Node:myhost:10000
Assigning MY_RESOURCE_0 to Node:myhost:10000
Assigning MY_RESOURCE_2 to Node:myhost:10000
OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_2)
OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_1)
OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_0)
REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_0)
REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_1)
REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_2)


// adding 1 node adds a replica:

Starting instance Node:myhost:10001
Assigning MY_RESOURCE_1 to Node:myhost:10001
OFFLINE -> REPLICA (Node:myhost:10001, MY_RESOURCE_1)


// adding another node adds a new replica:

Starting instance Node:myhost:10002
Assigning MY_RESOURCE_0 to Node:myhost:10002
OFFLINE -> REPLICA (Node:myhost:10002, MY_RESOURCE_0)


// removing a node rebalances things but we end up with 3 leaders, 0 replica

Stopping instance Node:myhost:10000
Assigning MY_RESOURCE_2 to Node:myhost:10002
REPLICA -> LEADER (Node:myhost:10002, MY_RESOURCE_0)
OFFLINE -> REPLICA (Node:myhost:10002, MY_RESOURCE_2)
REPLICA -> LEADER (Node:myhost:10002, MY_RESOURCE_2)
REPLICA -> LEADER (Node:myhost:10001, MY_RESOURCE_1)


I would like to get 1 leader and 1 replica for each partition, regardless of 
the number of nodes. Is that possible?

Thanks!

Matthieu



On Oct 15, 2013, at 02:30 , Kanak Biscuitwala 
<[email protected]<mailto:[email protected]>> wrote:

Hi Matthieu,

I have just pushed a patch to the master branch (i.e. trunk) that should fix 
the issue. Please let me know if the problem persists.

Thanks,
Kanak

________________________________
From: [email protected]<mailto:[email protected]>
To: [email protected]<mailto:[email protected]>
Subject: Re: Getting auto_rebalance right
Date: Mon, 14 Oct 2013 21:32:41 +0000

Hi Matthieu, this is a known bug in 0.6.1 release. We have fixed it in
trunk. If you are building from trunk, change ClusterConfigInit#init()

admin.addResource(DEFAULT_CLUSTER_NAME,
RESOURCE,
PARTITIONS,
"LEADER_REPLICA",
IdealStateModeProperty.AUTO_REBALANCE.toString());
to


admin.addResource(DEFAULT_CLUSTER_NAME, RESOURCE, PARTITIONS,

"LEADER_REPLICA",

RebalanceMode.FULL_AUTO.toString());


It should work. We are planing to make 0.6.2 release with a few fixes including 
this one.


Thanks,

Jason


From: Matthieu Morel 
<[email protected]<mailto:[email protected]><mailto:[email protected]>>
Reply-To:
"[email protected]<mailto:[email protected]><mailto:[email protected]>"
<[email protected]<mailto:[email protected]><mailto:[email protected]>>
Date: Monday, October 14, 2013 12:09 PM
To:
"[email protected]<mailto:[email protected]><mailto:[email protected]>"
<[email protected]<mailto:[email protected]><mailto:[email protected]>>
Subject: Getting auto_rebalance right

Hi,

I'm trying to use the auto-rebalance mode in Helix.

The use case is the following (standard leader-standby scenario, a bit
like the rsync example in the helix codebase):
- the dataspace is partitioned
- for a given partition, we have
- a leader that is responsible for writing and serving data, logging
operations into a journal
- a replica that fetches updates from a journal and applies them
locally but it does not serve data
Upon failure, the replica becomes leader, applies pending updates and
can write and serve data. Ideally we also get a new replica assigned.

We'd like to use the auto_rebalance mode in Helix so that partitions
are automatically assigned and re-assigned, and so that leaders are
automatically elected.


Unfortunately, I can't really get the balancing right. I might be doing
something wrong, so I uploaded an example here
: https://github.com/matthieumorel/helix-balancing


In this application I would like to get exactly 1 leader and 1 replica
for each of the partitions

In this example we don't reach that result, and when removing a node,
we even get to a situation where there is no leader for a given
partition.


Do I have wrong expectations? Is there something wrong with the code,
is it something with helix?


Thanks!

Matthieu

Re: Getting auto_rebalance right

Reply via email to