Resending since I realized you might not be registered on the user list yet. By 
the way, for your specific use case, I would personally lean towards the 
CustomCodeRunner along with the CUSTOMIZED IdealState rebalance mode. Then when 
nodes enter and exit, you can change the IdealState yourself and Helix will 
fire the transitions. This will most easily give you the policy-driven global 
view you're looking for.
---

Hi Vu,Your understanding is basically correct. The controller will rebalance 
each resource in sequence, at most one controller pipeline execution is going 
on at any one time, and there is no parallelism within the controller pipeline 
(other than batch reading and writing the cluster at the beginning and 
end).Here are some things that may be of use to know:1. You can plug in your 
own code to help decide how to rebalance your cluster in one of two ways:   - 
Using the CustomCodeRunner on the participant side so that you can update the 
IdealState whenever the cluster changes: 
https://github.com/apache/incubator-helix/blob/helix-0.6.2-release/helix-core/src/main/java/org/apache/helix/participant/HelixCustomCodeRunner.java?source=c
   - Implementing a Rebalancer with USER_DEFINED rebalance mode: 
https://github.com/apache/incubator-helix/blob/helix-0.6.2-release/helix-core/src/main/java/org/apache/helix/controller/rebalancer/Rebalancer.java?source=c
In either case, Helix will still fire transitions according to constraints and 
react to node entry/exit.
2. Helix supports adding tags to nodes (via InstanceConfig), and specifying 
tags in each resource IdealState. Then, a tagged resource will only be assigned 
to nodes with the corresponding tag present.
3. You can specify max partitions per resource per node in the IdealState of 
the resource (this should be 1 in your case)
4. You can combine any of the above 3 if that makes sense (e.g. change node 
tags whenever a cluster change happens, thus constraining how Helix will assign 
everything)
Is that helpful?
KanakDate: Wed, 1 Jan 2014 20:31:56 -0800
Subject: helix rebalancing for multiple resources
From: [email protected]
To: [email protected]

Hi,We're looking into creating something like a distributed task processing 
cluster.  We already have existing code for the processing task on a single 
host.  So that results in stronger restrictions on what we're doing:
- partitioned task A: single partition needs to be assigned to a single node 
and a node may have only a single partitioned task
- another set of non-partitioned tasks (e.g. B, C, D) also needs to be assigned 
nodes, but it would be most efficient of those tasks are assigned to separate 
nodes so any single node has at most 1 task (either partitioned A, B, C, D, 
etc.)

This seems to require a global view of a tasks.  However, from the examples and 
the Rebalancer code, it appears that the resource mappings/assignments are 
independent of each another.  Is that correct?  If so, is Apache Helix the 
right framework for us, given the requirements above?

I saw that it might be possible to find the current resource assignment for 
other resources during the rebalancing calculation methods, but I was then 
concerned about concurrency issues--if the rebalance for task A and rebalance 
for B was computed at the same time.

Thanks for any and all feedback.

Vu Nguyen                                         

Reply via email to