Re: getting created child on NodeChildrenChanged event

2010-08-31 Thread Todd Nine
Hi Dave,
  Thanks for the response.  I understand your point about missed events
during a watch reset period.  I may be off, here is the functionality I
was thinking.  I'm not sure if the ZK internal versioning process could
possibly support something like this.

1. A watch is placed on children
2. The event is fired to the client.  The client receives the Stat
object as part of the event for the current state of the node when the
event was created.  We'll call this Stat A with version 1
3. The client performs processing.  Meanwhile the node has several
children changed. Versions are incremented to version 2 and version 3
4. Client resets the watch
5. A node is added
6. The event is fired to the client.  Client receives Stat B with
version 4
7. Client calls performs a deltaChildren(Stat A, Stat B)
8. zookeeper returns added nodes between stats, also returns deleted
nodes between stats.

This would handle the missed event problem since the client would have
the 2 states it needs to compare.  It also allows clients dealing with
large data sets to only deal with the delta over time (like a git
replay).  Our number of queues could get quite large, and I'm concerned
that keeping my previous event's children in a set to perform the delta
may become quite memory and processor intensive  Would a feature like
this be possible without over complicating the Zookeeper core?
 

Thanks,
Todd

On Tue, 2010-08-31 at 09:23 -0400, Dave Wright wrote:

> Hi Todd -
> The general explanation for why Zookeeper doesn't pass the event information
> w/ the event notification is that an event notification is only triggered
> once, and thus may indicate multiple events. For example, if you do a
> GetChildren and set a watch, then multiple children are added at about the
> same time, the first one triggers a notification, but the second (or later)
> ones do not. When you do another GetChildren() request to get the list and
> reset the watch, you'll see all the changed nodes, however if you had just
> been told about the first change in the notification you would have missed
> the others.
> To do what you are wanting, you would really need "persistent" watches that
> send notifications every time a change occurs and don't need to be reset so
> you can't miss events. That isn't the design that was chosen for Zookeeper
> and I don't think it's likely to be implemented.
> 
> -Dave Wright
> 
> On Tue, Aug 31, 2010 at 3:49 AM, Todd Nine  wrote:
> 
> > Hi all,
> >  I'm writing a distributed queue monitoring class for our leader node in
> > the cluster.  We're queueing messages per input hardware device, this queue
> > is then assigned to a node with the least load in our cluster.  To do this,
> > I maintain 2 Persistent Znode with the following format.
> >
> > data queue
> >
> > /dataqueue/devices//
> >
> > processing follower
> >
> > /dataqueue/nodes//
> >
> > The queue monitor watches for changes on the path of /dataqueue/devices.
> >  When the first packet from a unit is received, the queue writer will
> > create
> > the queue with the unit id.  This triggers the watch event on the
> > monitoring
> > class, which in turn creates the znode for the path with the least loaded
> > node.  This path is watched for child node creation and the node creates a
> > queue consumer to consume messages from the new queue.
> >
> >
> > Our list of queues can become quite large, and I would prefer not to
> > maintain a list of queues I have assigned then perform a delta when the
> > event fires to determine which queues are new and caused the watch event. I
> > can't really use sequenced nodes and keep track of my last read position,
> > because I don't want to iterate over the list of queues to determine which
> > sequenced node belongs to the current unit id (it would require full
> > iteration, which really doesn't save me any reads).  Is it possible to
> > create a watch to return the path and Stat of the child node that caused
> > the
> > event to fire?
> >
> > Thanks,
> > Todd
> >


getting created child on NodeChildrenChanged event

2010-08-31 Thread Todd Nine
Hi all,
  I'm writing a distributed queue monitoring class for our leader node in
the cluster.  We're queueing messages per input hardware device, this queue
is then assigned to a node with the least load in our cluster.  To do this,
I maintain 2 Persistent Znode with the following format.

data queue

/dataqueue/devices//

processing follower

/dataqueue/nodes//

The queue monitor watches for changes on the path of /dataqueue/devices.
 When the first packet from a unit is received, the queue writer will create
the queue with the unit id.  This triggers the watch event on the monitoring
class, which in turn creates the znode for the path with the least loaded
node.  This path is watched for child node creation and the node creates a
queue consumer to consume messages from the new queue.


Our list of queues can become quite large, and I would prefer not to
maintain a list of queues I have assigned then perform a delta when the
event fires to determine which queues are new and caused the watch event. I
can't really use sequenced nodes and keep track of my last read position,
because I don't want to iterate over the list of queues to determine which
sequenced node belongs to the current unit id (it would require full
iteration, which really doesn't save me any reads).  Is it possible to
create a watch to return the path and Stat of the child node that caused the
event to fire?

Thanks,
Todd


Strage issue with chroot

2010-08-30 Thread Todd Nine
Hi all,
  I'm running into a very strange issue with using chroot.  I've defined the
following URL in my java client.

zookeeper.connections=localhost:2181/com/spidertracks/aviator

When I run my unit tests against Zookeeper, all works as expected when I
create the path "/cluster/election/node_xx" for leader election.  However,
when I create the parent path of "/cluster/election" in my integration
tests, I receive this error.

INFO [runningservices-SendThread(localhost:2181)] - Session establishment
complete on server localhost/fe80:0:0:0:0:0:0:1%1:2181, sessionid =
0x12ac326df77, negotiated timeout = 5
 INFO [ProcessThread:-1] - Got user-level KeeperException when processing
sessionid:0x12ac326df77 type:create cxid:0x5 zxid:0xfffe
txntype:unknown reqpath:n/a Error Path:/com/spidertracks/aviator
Error:KeeperErrorCode = NoNode for /com/spidertracks/aviator


Is the path specified in chroot not created automatically when I make an api
call such as zooKeeper.create("/cluster/election", false)?

Thanks,
Todd


Re: Receiving create events for self with synchronous create

2010-08-26 Thread Todd Nine
Sure thing.  The FollowerWatcher class is instantiated by the
IClusterManager implementation.It then performs the following

FollowerWatcher.init() which is intended to do the following.

1. Create our follower node so that other nodes know we exist at path
"/com/spidertracks/aviator/cluster/follower/10.0.1.1"  where the last
node is an ephemeral node with the internal IP address of the node.
These are lines 67 through 72.
2. Signal to the clusterManager that the cluster has changed (line 79).
Ultimately the clusterManager will perform a barrier for partitioning
data ( a separate watcher)
3. Register a watcher to receive all future events on the follower path
"/com/spidertracks/aviator/cluster/follower/" line 81.


Then we have the following characteristics in the watcher

1. If a node has been added or deleted from the children of
"/com/spidertracks/aviator/cluster/follower" then continue.  Otherwise,
ignore the event.  Lines 33 through 44
2. If this was an event we should process our cluster has changed,
signal to the CusterManager that a node has either been added or
removed. line 51.


I'm trying to encapsulate the detection of additions and deletions of
child nodes within this Watcher.  All other events that occur due to a
node being added or deleted should be handled externally by the
clustermanager.

Thanks,
Todd


On Thu, 2010-08-26 at 19:26 -0700, Mahadev Konar wrote:

> Hi Todd,
>   The code that you point to, I am not able to make out the sequence
> of steps.
>Can you be more clear on what you are trying to do in terms of
> zookeeper api?
> 
> Thanks
> mahadev
> On 8/26/10 5:58 PM, "Todd Nine"  wrote:
> 
> 
> Hi all,
>   I'm running into a strange issue I could use a hand with.
>   I've
> implemented leader election, and this is working well.  I'm
> now
> implementing a follower queue with ephemeral nodes. I have an
> interface
> IClusterManager which simply has the api "clusterChanged".  I
> don't care
> if nodes are added or deleted, I always want to fire this
> event.  I have
> the following basic algorithm.
> 
> 
> init
> 
> Create a path with "/follower/"+mynode name
> 
> fire the clusterChangedEvent
> 
> Watch set the event watcher on the path "/follower".
> 
> 
> watch:
> 
> reset the watch on "/follower"
> 
> if event is not a NodeDeleted or NodeCreated, ignore
> 
> fire the clustermanager event
> 
> 
> this seems pretty straightforward.  Here is what I'm expecting
> 
> 
> 1. Create my node path
> 2. fire the clusterChanged event
> 3. Set watch on "/follower"
> 4. Receive watch events for changes from any other nodes.
> 
> What's actually happening
> 
> 1. Create my node path
> 2. fire the clusterChanged event
> 3. Set Watch on "/follower"
> 4. Receive watch event for node created in step 1
> 5. Receive future watch events for changes from any other
> nodes.
> 
> 
> Here is my code.  Since I set the watch after I create the
> node, I'm not
> expecting to receive the event for it.  Am I doing something
> incorrectly
> in creating my watch?  Here is my code.
> 
> http://pastebin.com/zDXgLagd
> 
> Thanks,
> Todd
> 
> 
> 
> 
> 


Receiving create events for self with synchronous create

2010-08-26 Thread Todd Nine
Hi all,
  I'm running into a strange issue I could use a hand with.   I've
implemented leader election, and this is working well.  I'm now
implementing a follower queue with ephemeral nodes. I have an interface
IClusterManager which simply has the api "clusterChanged".  I don't care
if nodes are added or deleted, I always want to fire this event.  I have
the following basic algorithm.


init

Create a path with "/follower/"+mynode name

fire the clusterChangedEvent

Watch set the event watcher on the path "/follower".


watch:

reset the watch on "/follower"

if event is not a NodeDeleted or NodeCreated, ignore

fire the clustermanager event


this seems pretty straightforward.  Here is what I'm expecting


1. Create my node path
2. fire the clusterChanged event
3. Set watch on "/follower"
4. Receive watch events for changes from any other nodes.

What's actually happening

1. Create my node path
2. fire the clusterChanged event
3. Set Watch on "/follower"
4. Receive watch event for node created in step 1
5. Receive future watch events for changes from any other nodes.


Here is my code.  Since I set the watch after I create the node, I'm not
expecting to receive the event for it.  Am I doing something incorrectly
in creating my watch?  Here is my code.

http://pastebin.com/zDXgLagd

Thanks,
Todd






Re: What roles do "even" nodes play in the ensamble

2010-08-25 Thread Todd Nine
Awesome, thanks guys.  Your patience and input is greatly appreciated.





On Wed, 2010-08-25 at 21:30 -0700, Henry Robinson wrote:

> Todd - 
> 
> 
> 
> No, this is not the case. There are no 'backup' or 'failover' nodes in
> ZooKeeper. All servers that can vote are working as part of the
> cluster until they fail. You need a majority of your voting servers
> alive. 
> 
> 
> If you have three servers, a majority is of size two. The number of
> nodes that can fail before a majority is no longer alive is one. 
> If you have four servers, a majority is of size three. The number of
> nodes that can fail before a majority is no longer alive is one. 
> If you have five servers, a majority is of size three. The number of
> nodes that can fail before a majority is no longer alive is two. 
> 
> 
> This is why four servers is worse than three for availability. In both
> cases, two servers have to fail before the cluster is no longer
> available. However if failures are independently distributed, this is
> more likely to happen in a cluster of four nodes than a cluster of
> three (think of it as 'more things available to go wrong'). 
> 
> 
> If you have four servers and one dies, the 'majority' that still needs
> to be alive is still three - it doesn't drop down to two. The majority
> is of all voting servers, alive or dead. 
> 
> 
> Hope this helps - 
> 
> 
> Henry
> 
> 
> On 25 August 2010 21:01, Todd Nine  wrote:
> 
> Thanks Dave.  I've been using Cassandra, so I'm trying to get
> my head
> around the configuration/operational differences with ZK.  You
> state
> that using 4 would actually decrease my reliability.  Can you
> explain
> that further?  I was under the impression that a 4th node
> would act as a
> non voting read only node until one of the other 3 fails.  I
> thought
> that this extra node would give me some breathing room by
> allowing any
> node to fail and still have 3 voting nodes.  Is this not the
> case?
> 
> Thanks,
> 
> Todd
> 
> 
> 
> 
> 
> 
> On Wed, 2010-08-25 at 21:13 -0600, Ted Dunning wrote:
> 
> > Just use 3 nodes.  Life will be better.
> >
> >
> >
> > You can configure the fourth node in the event of one of the
> first
> > three failing and bring it on line.  Then you can
> re-configure and
> > restart each of the others one at a time.  This gives you
> flexibility
> > because you have 4 nodes, but doesn't decrease your
> reliability the
> > way that using a four node cluster would.  If you need to do
> > maintenance on one node, just configure that node out as if
> it had
> > failed.
> >
> >
> > On Wed, Aug 25, 2010 at 4:26 PM, Dave Wright
> 
> > wrote:
> >
> > You can certainly serve more reads with a 4th node,
> but I'm
> > not sure
> > what you mean by "it won't have a voting role". It
> still
> > participates
>     >     in voting for leaders as do all non-observers
> regardless of
> > whether it
> > is an even or odd number. With zookeeper there is no
> voting on
> > each
> > transaction, only leader changes.
> >
> > -Dave Wright
> >
> >
> >
> > On Wed, Aug 25, 2010 at 6:22 PM, Todd Nine
> >  wrote:
> > > Do I get any read performance increase (similar to
> an
> > observer) since
> > > the node will not have a voting role?
> > >
> > >
> >
> >
> >
> >
> 
> 
> 
> 
> 
> -- 
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679
> 


Re: What roles do "even" nodes play in the ensamble

2010-08-25 Thread Todd Nine
Thanks Dave.  I've been using Cassandra, so I'm trying to get my head
around the configuration/operational differences with ZK.  You state
that using 4 would actually decrease my reliability.  Can you explain
that further?  I was under the impression that a 4th node would act as a
non voting read only node until one of the other 3 fails.  I thought
that this extra node would give me some breathing room by allowing any
node to fail and still have 3 voting nodes.  Is this not the case?

Thanks,

Todd




On Wed, 2010-08-25 at 21:13 -0600, Ted Dunning wrote:

> Just use 3 nodes.  Life will be better.
> 
> 
> 
> You can configure the fourth node in the event of one of the first
> three failing and bring it on line.  Then you can re-configure and
> restart each of the others one at a time.  This gives you flexibility
> because you have 4 nodes, but doesn't decrease your reliability the
> way that using a four node cluster would.  If you need to do
> maintenance on one node, just configure that node out as if it had
> failed.
> 
> 
> On Wed, Aug 25, 2010 at 4:26 PM, Dave Wright 
> wrote:
> 
> You can certainly serve more reads with a 4th node, but I'm
> not sure
> what you mean by "it won't have a voting role". It still
> participates
> in voting for leaders as do all non-observers regardless of
> whether it
> is an even or odd number. With zookeeper there is no voting on
> each
> transaction, only leader changes.
> 
> -Dave Wright
> 
> 
> 
> On Wed, Aug 25, 2010 at 6:22 PM, Todd Nine
>  wrote:
> > Do I get any read performance increase (similar to an
> observer) since
> > the node will not have a voting role?
> >
> >
> 
> 
> 
> 


Re: What roles do "even" nodes play in the ensamble

2010-08-25 Thread Todd Nine
Do I get any read performance increase (similar to an observer) since
the node will not have a voting role?




On Wed, 2010-08-25 at 15:18 -0700, Henry Robinson wrote:

> Dave is correct - if you have N nodes you need  (N/2) + 1 votes (i.e. a
> majority) in the standard case to get a vote to pass.
> 
> Adding a fourth voting node to a three node cluster will cause the size of a
> majority to jump from 2 to 3. The number of nodes that need to fail before
> you can no longer get a majority is 2 in both cases - so you don't get any
> reliability for adding a new voting node to a odd-numbered cluster.
> 
> The new node will always act as a voter unless you explicitly configure it
> as an observer.
> 
> Henry
> 
> On 25 August 2010 15:11, Dave Wright  wrote:
> 
> > I'm not an expert on voting, so there may be a better answer, but from my
> > understanding all 4 nodes participate in the voting and you need a majority
> > of 3 to elect a leader.
> >
> > -Dave
> >
> > On Wed, Aug 25, 2010 at 6:09 PM, Todd Nine 
> > wrote:
> >
> > >  Thanks for that Dave.  If I do not configure it as an observer just a
> > > normal member, what will the last even node to join do?
> > >
> > >
> > > 1. Will it participate as a voter on startup?  (I'm assuming not, just
> > read
> > > only)
> > >
> > > 2. If one of the voter nodes 1 through 3 dies, does it become a voter?
> > >
> > >
> > >todd
> > > SENIOR SOFTWARE ENGINEER
> > >
> > > todd nine| spidertracks ltd |  117a the square
> > > po box 5203 | palmerston north 4441 | new zealand
> > > P: +64 6 353 3395 | M: +64 210 255 8576
> > > E: t...@spidertracks.co.nz W: www.spidertracks.com
> > >
> > >
> > >
> > >
> > >
> > >   On Wed, 2010-08-25 at 17:57 -0400, Dave Wright wrote:
> > >
> > > >
> > > > 1. When the 4th ZK node joins the cluster, does it take on the observer
> > > > role since a quorum cannot be reached with the new node?  Can I still
> > > > connect my clients to it and create/remove nodes and receive events?
> > >
> > > No, it joins as a normal member unless you've configured it as an
> > > observer. Note that with 4 nodes you now need 3 running to get a
> > > majority, which is why even numbers aren't recommended.
> > >
> > > >
> > > >
> > > > 2. In the event 1 of the 3 voting nodes fails, will this 4th node
> > become
> > > > a voting member of the ensemble?
> > >
> > > If configured as an observer it remains an observer.
> > >
> > > >
> > > > 3. When a new node comes online, it may have a different ip than the
> > > > previous node.  Do I need to update all node configurations and perform
> > > > a rolling restart, or will simply connecting the new node to the
> > > > existing ensemble make all nodes aware it is running?
> > >
> > > Unfortunately ZK doesn't have any kind of dynamic configuration like
> > > that currently. You need to update all the config files and restart
> > > the ensemble.
> > >
> > > -Dave Wright
> > >
> > >
> >
> 
> 
> 


Re: What roles do "even" nodes play in the ensamble

2010-08-25 Thread Todd Nine
Thanks for that Dave.  If I do not configure it as an observer just a
normal member, what will the last even node to join do?


1. Will it participate as a voter on startup?  (I'm assuming not, just
read only)

2. If one of the voter nodes 1 through 3 dies, does it become a voter?


todd 
SENIOR SOFTWARE ENGINEER

todd nine| spidertracks ltd |  117a the square 
po box 5203 | palmerston north 4441 | new zealand 
P: +64 6 353 3395 | M: +64 210 255 8576  
E: t...@spidertracks.co.nz W: www.spidertracks.com 





On Wed, 2010-08-25 at 17:57 -0400, Dave Wright wrote:

> >
> > 1. When the 4th ZK node joins the cluster, does it take on the observer
> > role since a quorum cannot be reached with the new node?  Can I still
> > connect my clients to it and create/remove nodes and receive events?
> 
> No, it joins as a normal member unless you've configured it as an
> observer. Note that with 4 nodes you now need 3 running to get a
> majority, which is why even numbers aren't recommended.
> 
> >
> >
> > 2. In the event 1 of the 3 voting nodes fails, will this 4th node become
> > a voting member of the ensemble?
> 
> If configured as an observer it remains an observer.
> 
> >
> > 3. When a new node comes online, it may have a different ip than the
> > previous node.  Do I need to update all node configurations and perform
> > a rolling restart, or will simply connecting the new node to the
> > existing ensemble make all nodes aware it is running?
> 
> Unfortunately ZK doesn't have any kind of dynamic configuration like
> that currently. You need to update all the config files and restart
> the ensemble.
> 
> -Dave Wright


What roles do "even" nodes play in the ensamble

2010-08-25 Thread Todd Nine
Hey guys,
  Forgive me if this is documented somewhere, but I can't find an
answer.  Our application is not enormous, so we will be using 4
application nodes that will also initially run Zookeeper.  As our load
increases, Zookeeper will be moved to nodes that only run ZK and no
other processes.  Given that we will initially only have 4 nodes in our
cluster and I have a few questions around the semantics of an even
number of nodes.

1. When the 4th ZK node joins the cluster, does it take on the observer
role since a quorum cannot be reached with the new node?  Can I still
connect my clients to it and create/remove nodes and receive events?


2. In the event 1 of the 3 voting nodes fails, will this 4th node become
a voting member of the ensemble?

3. When a new node comes online, it may have a different ip than the
previous node.  Do I need to update all node configurations and perform
a rolling restart, or will simply connecting the new node to the
existing ensemble make all nodes aware it is running?

Thanks,

Todd








Re: Non Hadoop scheduling frameworks

2010-08-25 Thread Todd Nine
Thanks for the feedback.  I'm probably going to modify quartz to work
with Zookeeper to start and launch jobs.  Architecturally, I don't think
persisting Jobs or trigger history in ZK is a very good idea, it's
turning it into a persistent data store, which is not designed for.  I
was thinking I could change the core APIs in the following way.

Implement leader/follower election as a standalone module.  Is this
already done somewhere?   I know there's a recipe but if the code is
done that's less for me to do.


Implement an abstract JobStore implementation (ZooKeeperJobStore) with
the following properties


Default Case

1. All calls that deal with returning triggers will use the
follower/leader semantics.  All nodes (including the leader) will be
followers.  They will only be returned jobs they should run for the call
aquireNextTrigger
2. All calls to writing triggers will write triggers to the datastore
and to a trigger queue in ZK
3. The leader will pick up triggers from the queue, and distribute them
to the next available node via the ZK trigger queues per node.  Each
operation will attempt to be wisely partitioned.  In the first
implementation, it will simply schedule the job on a node that has the
least executions near the time specified for the trigger.  In the next
release, I could use average job duration semantics to try to avoid
scheduling overlapping jobs, especially in long running jobs.

Failover

1. The leader will scan all current followers when a follower leaves, or
after a new leader is designated.
2. For any node with jobs that is not currently a follower, it's
triggers will be re-written to the trigger queue from above
3. The redistribution semantics will fire from above




Does this sound reasonable?  After performing more research I think job
semantics such as partitioning and parallel processing are outside the
scope of how the scheduler should work.  Those semantics are more
internal to the job itself, and I think they should remain outside of
the scope of this project.


todd 
SENIOR SOFTWARE ENGINEER

todd nine| spidertracks ltd |  





On Tue, 2010-08-24 at 04:20 +, Ted Dunning wrote:

> These are pretty easy to solve with ZK.  Ephemerality, exclusive create,
> atomic update and file versions allow you to implement most of the semantics
> you need.
> 
> I don't know of any recipes available for this, but they would be worthy
> additions to ZK.
> 
> On Mon, Aug 23, 2010 at 11:33 PM, Todd Nine  wrote:
> 
> > Solving UC1 and UC2 via zookeeper or some other framework if one is
> > recommended.  We don't run Hadoop, just ZK and Cassandra as we don't have a
> > need for map/reduce.  I'm searching for any existing framework that can
> > perform standard time based scheduling in a distributed environment.  As I
> > said earlier, Quartz is the closest model to what we're looking for, but it
> > can't be used in a distributed parallel environment.  Any suggestions for a
> > system that could accomplish this would be helpful.
> >
> > Thanks,
> > Todd
> >
> > On 24 August 2010 11:27, Mahadev Konar  wrote:
> >
> > > Hi Todd,
> > >  Just to be clear, are you looking at solving UC1 and UC2 via zookeeper?
> > Or
> > > is this a broader question for scheduling on cassandra nodes? For the
> > latter
> > > this probably isnt the right mailing list.
> > >
> > > Thanks
> > > mahadev
> > >
> > >
> > > On 8/23/10 4:02 PM, "Todd Nine"  wrote:
> > >
> > > Hi all,
> > >  We're using Zookeeper for Leader Election and system monitoring.  We're
> > > also using it for synchronizing our cluster wide jobs with  barriers.
> > >  We're
> > > running into an issue where we now have a single job, but each node can
> > > fire
> > > the job independently of others with different criteria in the job.  In
> > the
> > > event of a system failure, another node in our application cluster will
> > > need
> > > to fire this Job.  I've used quartz previously (we're running Java 6),
> > but
> > > it simply isn't designed for the use case we have.  I found this article
> > on
> > > cloudera.
> > >
> > > http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/
> > >
> > >
> > > I've looked at both plugins, but they require hadoop.  We're not
> > currently
> > > running hadoop, we only have Cassandra.  Here are the 2 basic use cases
> > we
> > > need to support.
> > >
> > > UC1: Synchronized Jobs
> > > 1. A job is fired across all nodes
> > > 2. The nodes wait

Re: Non Hadoop scheduling frameworks

2010-08-23 Thread Todd Nine
Solving UC1 and UC2 via zookeeper or some other framework if one is
recommended.  We don't run Hadoop, just ZK and Cassandra as we don't have a
need for map/reduce.  I'm searching for any existing framework that can
perform standard time based scheduling in a distributed environment.  As I
said earlier, Quartz is the closest model to what we're looking for, but it
can't be used in a distributed parallel environment.  Any suggestions for a
system that could accomplish this would be helpful.

Thanks,
Todd

On 24 August 2010 11:27, Mahadev Konar  wrote:

> Hi Todd,
>  Just to be clear, are you looking at solving UC1 and UC2 via zookeeper? Or
> is this a broader question for scheduling on cassandra nodes? For the latter
> this probably isnt the right mailing list.
>
> Thanks
> mahadev
>
>
> On 8/23/10 4:02 PM, "Todd Nine"  wrote:
>
> Hi all,
>  We're using Zookeeper for Leader Election and system monitoring.  We're
> also using it for synchronizing our cluster wide jobs with  barriers.
>  We're
> running into an issue where we now have a single job, but each node can
> fire
> the job independently of others with different criteria in the job.  In the
> event of a system failure, another node in our application cluster will
> need
> to fire this Job.  I've used quartz previously (we're running Java 6), but
> it simply isn't designed for the use case we have.  I found this article on
> cloudera.
>
> http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/
>
>
> I've looked at both plugins, but they require hadoop.  We're not currently
> running hadoop, we only have Cassandra.  Here are the 2 basic use cases we
> need to support.
>
> UC1: Synchronized Jobs
> 1. A job is fired across all nodes
> 2. The nodes wait until the barrier is entered by all participants
> 3. The nodes process the data and leave
> 4. On all nodes leaving the barrier, the Leader node marks the job as
> complete.
>
>
> UC2: Multiple Jobs per Node
> 1. A Job is scheduled for a future time on a specific node (usually the
> same
> node that's creating the trigger)
> 2. A Trigger can be overwritten and cancelled without the job firing
> 3. In the event of a node failure, the Leader will take all pending jobs
> from the failed node, and partition them across the remaining nodes.
>
>
> Any input would be greatly appreciated.
>
> Thanks,
> Todd
>
>


Non Hadoop scheduling frameworks

2010-08-23 Thread Todd Nine
Hi all,
  We're using Zookeeper for Leader Election and system monitoring.  We're
also using it for synchronizing our cluster wide jobs with  barriers.  We're
running into an issue where we now have a single job, but each node can fire
the job independently of others with different criteria in the job.  In the
event of a system failure, another node in our application cluster will need
to fire this Job.  I've used quartz previously (we're running Java 6), but
it simply isn't designed for the use case we have.  I found this article on
cloudera.

http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/


I've looked at both plugins, but they require hadoop.  We're not currently
running hadoop, we only have Cassandra.  Here are the 2 basic use cases we
need to support.

UC1: Synchronized Jobs
1. A job is fired across all nodes
2. The nodes wait until the barrier is entered by all participants
3. The nodes process the data and leave
4. On all nodes leaving the barrier, the Leader node marks the job as
complete.


UC2: Multiple Jobs per Node
1. A Job is scheduled for a future time on a specific node (usually the same
node that's creating the trigger)
2. A Trigger can be overwritten and cancelled without the job firing
3. In the event of a node failure, the Leader will take all pending jobs
from the failed node, and partition them across the remaining nodes.


Any input would be greatly appreciated.

Thanks,
Todd