Hi, We're looking into creating something like a distributed task processing cluster. We already have existing code for the processing task on a single host. So that results in stronger restrictions on what we're doing: - partitioned task A: single partition needs to be assigned to a single node and a node may have only a single partitioned task - another set of non-partitioned tasks (e.g. B, C, D) also needs to be assigned nodes, but it would be most efficient of those tasks are assigned to separate nodes so any single node has at most 1 task (either partitioned A, B, C, D, etc.)
This seems to require a global view of a tasks. However, from the examples and the Rebalancer code, it appears that the resource mappings/assignments are independent of each another. Is that correct? If so, is Apache Helix the right framework for us, given the requirements above? I saw that it might be possible to find the current resource assignment for other resources during the rebalancing calculation methods, but I was then concerned about concurrency issues--if the rebalance for task A and rebalance for B was computed at the same time. Thanks for any and all feedback. Vu Nguyen
