Wow, thank you for that awesome explanation Ted. Makes me feel really happy
that I decided to build redis_failover on top of ZooKeeper! :)

https://github.com/ryanlecompte/redis_failover

Best,
Ryan

On Thu, Apr 19, 2012 at 11:53 AM, Ted Dunning <[email protected]> wrote:

> The client can't think it has succeeded with a deletion if it is connected
> to the minority side of a partitioned cluster.  To think that, the commit
> would have to be be ack'ed by a majority which by definition can't happen
> either because the master is in the minority and can't get a majority or
> because the master is no longer reachable from the server the client is
> connected to.  If the master is in the minority, then when the commit
> fails, the minority will start a leader election which will fail due to
> inability to commit.  At some point, the majority side will tire of waiting
> to hear from the master and will also start an election which will succeed.
>
> All clients connected to the minority side will be told to reconnect and
> will either fail if they can't talk to a node on the master side or will
> succeed in connecting with a node in the new quorum.
>
> When and if the partition heals, the nodes in the minority will
> resynchronize and then start handling connections and requests.  Pending
> requests from them will be discarded because the epoch number will have
> been incremented in the new leader election.
>
> On Thu, Apr 19, 2012 at 11:42 AM, Ryan LeCompte <[email protected]>
> wrote:
>
> > Great questions. I'd also like to add:
> >
> > - What happens when there is a network partition, and one client
> > successfully deletes a znode for which other clients have setup watches?
> > Are the clients guaranteed to receive that node deleted watch event if
> the
> > client successfully thinks it deleted the znode from the other side?
> >
> > Thanks,
> > Ryan
> >
> > On Thu, Apr 19, 2012 at 11:29 AM, Martin Kou <[email protected]>
> wrote:
> >
> > > Hi folks,
> > >
> > > I've got a few questions about how Zookeeper servers behave in
> > fail-recover
> > > scenarios:
> > >
> > > Assuming I have a 5-Zookeeper cluster, and one of the servers went dark
> > and
> > > came back, like 1 hour later.
> > >
> > >   1. Is it correct to assume that clients won't be able to connect to
> the
> > >   recovering server while it's still synchronizing with the leader, and
> > > thus
> > >   any new client connections would automatically fall back to the
> other 4
> > >   servers during synchronization?
> > >   2. The documentation says a newly recovered server would have
> > (initLimit
> > >   * tickTime) seconds to synchronize with the leader when it's
> restarted.
> > > Is
> > >   it correct to assume the time needed for synchronization is bounded
> by
> > > the
> > >   amount of data managed by Zookeeper? Let's say in the worst case,
> > someone
> > >   set a very large snapCount to the cluster, there were a lot of
> > >   transactions, but there aren't a lot of znodes - and thus there
> aren't
> > a
> > >   lot of data in each Zookeeper server but a very long transaction log.
> > > Would
> > >   that bound still hold?
> > >   3. I noticed from the documentation that a Zookeeper server falling >
> > >   (syncLimit * tickTime) seconds from the leader will be dropped from
> > > quorum.
> > >   I guess that's for detecting network partitions, right? If the
> > > partitioned
> > >   server does report back to the leader later, how would it behave?
> (e.g.
> > >   would it deny new client connections while it's synchronizing?)
> > >
> > > Thanks.
> > >
> > > Best Regards,
> > > Martin Kou
> > >
> >
>

Reply via email to