Solving UC1 and UC2 via zookeeper or some other framework if one is recommended. We don't run Hadoop, just ZK and Cassandra as we don't have a need for map/reduce. I'm searching for any existing framework that can perform standard time based scheduling in a distributed environment. As I said earlier, Quartz is the closest model to what we're looking for, but it can't be used in a distributed parallel environment. Any suggestions for a system that could accomplish this would be helpful.
Thanks, Todd On 24 August 2010 11:27, Mahadev Konar <maha...@yahoo-inc.com> wrote: > Hi Todd, > Just to be clear, are you looking at solving UC1 and UC2 via zookeeper? Or > is this a broader question for scheduling on cassandra nodes? For the latter > this probably isnt the right mailing list. > > Thanks > mahadev > > > On 8/23/10 4:02 PM, "Todd Nine" <t...@spidertracks.co.nz> wrote: > > Hi all, > We're using Zookeeper for Leader Election and system monitoring. We're > also using it for synchronizing our cluster wide jobs with barriers. > We're > running into an issue where we now have a single job, but each node can > fire > the job independently of others with different criteria in the job. In the > event of a system failure, another node in our application cluster will > need > to fire this Job. I've used quartz previously (we're running Java 6), but > it simply isn't designed for the use case we have. I found this article on > cloudera. > > http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/ > > > I've looked at both plugins, but they require hadoop. We're not currently > running hadoop, we only have Cassandra. Here are the 2 basic use cases we > need to support. > > UC1: Synchronized Jobs > 1. A job is fired across all nodes > 2. The nodes wait until the barrier is entered by all participants > 3. The nodes process the data and leave > 4. On all nodes leaving the barrier, the Leader node marks the job as > complete. > > > UC2: Multiple Jobs per Node > 1. A Job is scheduled for a future time on a specific node (usually the > same > node that's creating the trigger) > 2. A Trigger can be overwritten and cancelled without the job firing > 3. In the event of a node failure, the Leader will take all pending jobs > from the failed node, and partition them across the remaining nodes. > > > Any input would be greatly appreciated. > > Thanks, > Todd > >