RE: helix rebalancing for multiple resources

Kanak Biscuitwala Wed, 01 Jan 2014 20:21:42 -0800

Hi Vu,

Your understanding is basically correct. The controller will rebalance each 
resource in sequence, at most one controller pipeline execution is going on at 
any one time, and there is no parallelism within the controller pipeline (other 
than batch reading and writing the cluster at the beginning and end).


Here are some things that may be of use to know:

1. You can plug in your own code to help decide how to rebalance your cluster 
in one of two ways:   - Using the CustomCodeRunner on the participant side so 
that you can update the IdealState whenever the cluster changes: 
https://github.com/apache/incubator-helix/blob/helix-0.6.2-release/helix-core/src/main/java/org/apache/helix/participant/HelixCustomCodeRunner.java?source=c
   - Implementing a Rebalancer with USER_DEFINED rebalance mode: 
https://github.com/apache/incubator-helix/blob/helix-0.6.2-release/helix-core/src/main/java/org/apache/helix/controller/rebalancer/Rebalancer.java?source=c
In either case, Helix will still fire transitions according to constraints and 
react to node entry/exit.
2. Helix supports adding tags to nodes (via InstanceConfig), and specifying 
tags in each resource IdealState. Then, a tagged resource will only be assigned 
to nodes with the corresponding tag present.
3. You can specify max partitions per resource per node in the IdealState of 
the resource (this should be 1 in your case)
4. You can combine any of the above 3 if that makes sense (e.g. change node 
tags whenever a cluster change happens, thus constraining how Helix will assign 
everything)
Is that helpful?
Kanak________________________________
> Date: Wed, 1 Jan 2014 19:44:58 -0800 
> Subject: helix rebalancing for multiple resources 
> From: [email protected] 
> To: [email protected] 
> 
> Hi, 
> We're looking into creating something like a distributed task 
> processing cluster. We already have existing code for the processing 
> task on a single host. So that results in stronger restrictions on 
> what we're doing: 
> - partitioned task A: single partition needs to be assigned to a single 
> node and a node may have only a single partitioned task 
> - another set of non-partitioned tasks (e.g. B, C, D) also needs to be 
> assigned nodes, but it would be most efficient of those tasks are 
> assigned to separate nodes so any single node has at most 1 task 
> (either partitioned A, B, C, D, etc.) 
> 
> This seems to require a global view of a tasks. However, from the 
> examples and the Rebalancer code, it appears that the resource 
> mappings/assignments are independent of each another. Is that correct? 
> If so, is Apache Helix the right framework for us, given the 
> requirements above? 
> 
> I saw that it might be possible to find the current resource assignment 
> for other resources during the rebalancing calculation methods, but I 
> was then concerned about concurrency issues--if the rebalance for task 
> A and rebalance for B was computed at the same time. 
> 
> Thanks for any and all feedback. 
> 
> Vu Nguyen 
>

RE: helix rebalancing for multiple resources

Reply via email to