These are pretty easy to solve with ZK. Ephemerality, exclusive create, atomic update and file versions allow you to implement most of the semantics you need.
I don't know of any recipes available for this, but they would be worthy additions to ZK. On Mon, Aug 23, 2010 at 11:33 PM, Todd Nine <t...@spidertracks.co.nz> wrote: > Solving UC1 and UC2 via zookeeper or some other framework if one is > recommended. We don't run Hadoop, just ZK and Cassandra as we don't have a > need for map/reduce. I'm searching for any existing framework that can > perform standard time based scheduling in a distributed environment. As I > said earlier, Quartz is the closest model to what we're looking for, but it > can't be used in a distributed parallel environment. Any suggestions for a > system that could accomplish this would be helpful. > > Thanks, > Todd > > On 24 August 2010 11:27, Mahadev Konar <maha...@yahoo-inc.com> wrote: > > > Hi Todd, > > Just to be clear, are you looking at solving UC1 and UC2 via zookeeper? > Or > > is this a broader question for scheduling on cassandra nodes? For the > latter > > this probably isnt the right mailing list. > > > > Thanks > > mahadev > > > > > > On 8/23/10 4:02 PM, "Todd Nine" <t...@spidertracks.co.nz> wrote: > > > > Hi all, > > We're using Zookeeper for Leader Election and system monitoring. We're > > also using it for synchronizing our cluster wide jobs with barriers. > > We're > > running into an issue where we now have a single job, but each node can > > fire > > the job independently of others with different criteria in the job. In > the > > event of a system failure, another node in our application cluster will > > need > > to fire this Job. I've used quartz previously (we're running Java 6), > but > > it simply isn't designed for the use case we have. I found this article > on > > cloudera. > > > > http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/ > > > > > > I've looked at both plugins, but they require hadoop. We're not > currently > > running hadoop, we only have Cassandra. Here are the 2 basic use cases > we > > need to support. > > > > UC1: Synchronized Jobs > > 1. A job is fired across all nodes > > 2. The nodes wait until the barrier is entered by all participants > > 3. The nodes process the data and leave > > 4. On all nodes leaving the barrier, the Leader node marks the job as > > complete. > > > > > > UC2: Multiple Jobs per Node > > 1. A Job is scheduled for a future time on a specific node (usually the > > same > > node that's creating the trigger) > > 2. A Trigger can be overwritten and cancelled without the job firing > > 3. In the event of a node failure, the Leader will take all pending jobs > > from the failed node, and partition them across the remaining nodes. > > > > > > Any input would be greatly appreciated. > > > > Thanks, > > Todd > > > > >