I think I have a bit of it written already. It doesn't use Curator and I think you could simplify it substantially if you were to use it. Would that help?
On Thu, Jan 12, 2012 at 12:52 PM, Jordan Zimmerman <jzimmer...@netflix.com>wrote: > Ted - are you interested in writing this on top of Curator? If not, I'll > give it a whack. > > -JZ > > On 1/5/12 12:50 AM, "Ted Dunning" <ted.dunn...@gmail.com> wrote: > > >Jordan, I don't think that leader election does what Josh wants. > > > >I don't think that consistent hashing is particularly good for that either > >because the loss of one node causes the sequential state for lots of > >entities to move even among nodes that did not fail. > > > >What I would recommend is a variant of micro-sharding. The key space is > >divided into many micro-shards. Then nodes that are alive claim the > >micro-shards using ephemerals and proceed as Josh described. On loss of a > >node, the shards that node was handling should be claimed by the remaining > >nodes. When a new node appears or new work appears, it is helpful to > >direct nodes to effect a hand-off of traffic. > > > >In my experience, the best way to implement shard balancing is with and > >external master instance much in the style of hbase or katta. This > >external master can be exceedingly simple and only needs to wake up on > >various events like loss of a node or change in the set of live shards. > >It > >can also wake up at intervals if desired to backstop the normal > >notifications or to allow small changes for certain kinds of balancing. > > Typically, this only requires a few hundred lines of code. > > > >This external master can, of course, be run on multiple nodes and which > >master is in current control can be adjudicated with yet another leader > >election. > > > >You can view this as a package of many leader elections. Or as > >discretized > >consistent hashing. The distinctions are a bit subtle but are very > >important. These include, > > > >- there is a clean division of control between the master which determines > >who serves what and the nodes that do the serving > > > >- there is no herd effect because the master drives the assignments > > > >- node loss causes the minimum amount of change of assignments since no > >assignments to surviving nodes are disturbed. This is a major win. > > > >- balancing is pretty good because there are many shards compared to the > >number of nodes. > > > >- the balancing strategy is highly pluggable. > > > >This pattern would make a nice addition to Curator, actually. It comes up > >repeatedly in different contexts. > > > >On Thu, Jan 5, 2012 at 12:11 AM, Jordan Zimmerman > ><jzimmer...@netflix.com>wrote: > > > >> OK - so this is two options for doing the same thing. You use a Leader > >> Election algorithm to make sure that only one node in the cluster is > >> operating on a work unit. Curator has an implementation (it's really > >>just > >> a distributed lock with a slightly different API). > >> > >> -JZ > >> > >> On 1/5/12 12:04 AM, "Josh Stone" <pacesysj...@gmail.com> wrote: > >> > >> >Thanks for the response. Comments below: > >> > > >> >On Wed, Jan 4, 2012 at 10:46 PM, Jordan Zimmerman > >> ><jzimmer...@netflix.com>wrote: > >> > > >> >> Hi Josh, > >> >> > >> >> >Second use case: Distributed locking > >> >> This is one of the most common uses of ZooKeeper. There are many > >> >> implementations - one included with the ZK distro. Also, there is > >> >>Curator: > >> >> https://github.com/Netflix/curator > >> >> > >> >> >First use case: Distributing work to a cluster of nodes > >> >> This sounds feasible. If you give more details I and others on this > >>list > >> >> can help more. > >> >> > >> > > >> >Sure. I basically want to handle race conditions where two commands > >>that > >> >operate on the same data are received by my cluster of znodes, > >> >concurrently. One approach is to lock on the data that is effected by > >>the > >> >command (distributed lock). Another approach is make sure that all of > >>the > >> >commands that operate on any set of data are routed to the same node, > >> >where > >> >they can be processed serially using local synchronization. Consistent > >> >hashing is an algorithm that can be used to select a node to handle a > >> >message (where the inputs are the key to hash and the number of nodes > >>in > >> >the cluster). > >> > > >> >There are various implementations for this floating around. I'm just > >> >interesting to know how this is working for anyone else. > >> > > >> >Josh > >> > > >> > > >> >> > >> >> -JZ > >> >> > >> >> ________________________________________ > >> >> From: Josh Stone [pacesysj...@gmail.com] > >> >> Sent: Wednesday, January 04, 2012 8:09 PM > >> >> To: user@zookeeper.apache.org > >> >> Subject: Use cases for ZooKeeper > >> >> > >> >> I have a few use cases that I'm wondering if ZooKeeper would be > >>suitable > >> >> for and would appreciate some feedback. > >> >> > >> >> First use case: Distributing work to a cluster of nodes using > >>consistent > >> >> hashing to ensure that messages of some type are consistently > >>handled by > >> >> the same node. I haven't been able to find any info about ZooKeeper + > >> >> consistent hashing. Is anyone using it for this? A concern here > >>would be > >> >> how to redistribute work as nodes come and go from the cluster. > >> >> > >> >> Second use case: Distributed locking. I noticed that there's a recipe > >> >>for > >> >> this on the ZooKeeper wiki. Is anyone doing this? Any issues? One > >> >>concern > >> >> would be how to handle orphaned locks if a node that obtained a lock > >> >>goes > >> >> down. > >> >> > >> >> Third use case: Fault tolerance. If we utilized ZooKeeper to > >>distribute > >> >> messages to workers, can it be made to handle a node going down by > >> >> re-distributing the work to another node (perhaps messages that are > >>not > >> >> ack'ed within a timeout are resent)? > >> >> > >> >> Cheers, > >> >> Josh > >> >> > >> > >> > >