Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU

2010-04-30 Thread Patrick Hunt


On 04/30/2010 10:16 AM, Aaron Crow wrote:

Hi Patrick, thanks for your time and detailed questions.



No worries. When we hear about an issue we're very interested to 
followup and resolve it, regardless of the source. We take the project 
goals of high reliability/availablity _very_ seriously, we know that our 
users are deploying to mission critical environments.



We're running on Java build 1.6.0_14-b08, on Ubuntu 4.2.4-1ubuntu3. Below is
output from a recent stat, and a question about node count. For your other
questions, I should save your time with a batch reply: I wasn't tracking
nearly enough things (like logs), so it might not be fruitful to try to
investigate this failure in detail.



NP. If you had those available I would be interested to take a look but 
I think this node count size (below) is probably the issue.



I've since started rolling out better settings, including the memory and GC
settings recommended in the wiki, and logging to ROLLINGFILE. Also logging
and archiving verbose gc. We've been fine tuning our client app with similar
settings, but my bad that we didn't roll out ZK itself with these settings
as well. (Oh and I also set my browser's homepage to
http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing ;-)


Great. That page is really good for challenging assumptions.


So if we have any more serious issues, I think I should be able to provide
better details at that time.



Great!


I do have a question about node count, though. Below is output from stat on
one of our servers. Node count says *1182653*. Really!? Well, from our
application stand point, we only create and use hundreds of nodes at the
most. Certainly not any amount in the neighborhood of 1 million. Does this
demonstrate a serious problem?



It certainly looks like this might be the issue. One millions znodes is 
alot, but I've tested with more (upwards of 5 million znodes with 
25million concurrent watches is the most I remember testing) 
successfully. It's really just a matter of tuning the GC and starting 
the JVM with sufficient heap. It could be that the counter is wrong, but 
I doubt it (never seen that before). Is the count approximately the same 
on all servers in the ensemble?


Given that you are only are expecting hundreds of znodes, I suspect that 
you have a leak of znodes somewhere.


Suggestion - Fire up the java client shell (bin/zkCli.sh or just run 
ZooKeeperMain directly), connect to one of your servers and use the ls 
command to look around. You might be surprised at what you find. (or you 
might find that the count is wrong, etc...) It's a good sanity check.


Patrick



Thanks,
Aaron

rails_dep...@task01:~/zookeeper-3.3.0/bin$ echo stat | nc 127.0.0.1 2181

Zookeeper version: 3.3.0-925362, built on 03/19/2010 18:38 GMT

Clients:

  /10.0.10.116:34648[1](queued=0,recved=86200,sent=86200)

  /10.0.10.116:45609[1](queued=0,recved=27695,sent=27695)

  /10.0.10.120:48432[1](queued=0,recved=26030,sent=26030)

  /10.0.10.117:53336[1](queued=0,recved=43100,sent=43100)

  /10.0.10.120:35087[1](queued=0,recved=4268,sent=4268)

  /10.0.10.116:49526[1](queued=0,recved=4273,sent=4273)

  /10.0.10.118:45614[1](queued=0,recved=43624,sent=43624)

  /10.0.10.116:45600[1](queued=0,recved=27704,sent=27704)

  /10.0.10.120:48440[1](queued=0,recved=7161,sent=7161)

  /10.0.10.120:48437[1](queued=0,recved=7180,sent=7180)

  /10.0.10.118:59310[1](queued=0,recved=63205,sent=63205)

  /10.0.10.116:51072[1](queued=0,recved=14516,sent=14516)

  /10.0.10.116:51071[1](queued=0,recved=14516,sent=14516)

  /10.0.10.119:34097[1](queued=0,recved=42984,sent=42984)

  /10.0.10.119:41868[1](queued=0,recved=18558,sent=18558)

  /10.0.10.120:48441[1](queued=0,recved=21712,sent=21712)

  /10.0.10.116:49528[1](queued=0,recved=12967,sent=12967)

  /127.0.0.1:37234[1](queued=0,recved=1,sent=0)



Latency min/avg/max: 0/0/105


Received: 497790

Sent: 497788

Outstanding: 0

Zxid: 0xa0003aa4b

Mode: follower

*Node count: 1182653*




On Wed, Apr 28, 2010 at 10:56 PM, Patrick Huntph...@apache.org  wrote:


Btw, are you monitoring the ZK server jvms? Please take a look at


http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_zkCommands

It would be interesting if you could run commmands such as stat against
your currently running cluster. In particular I'd be interested to know what
you see for latency and node count in the stat response. Run this command
against each of your servers in the ensemble. For example high max latency
might indicate that something (usually swap/gc) is causing the server to
respond slowly in some cases.

Patrick


On 04/28/2010 10:47 PM, Patrick Hunt wrote:


Hi Aaron, some questions/comments below:

On 04/28/2010 06:29 PM, Aaron Crow wrote:


We were running version 3.2.2 for about a month and it was working
well for
us. Then late this past Saturday night, our cluster went pathological.
One
of the 3 ZK servers spewed many WARNs (see below), and the other 2
servers
were 

Re: Question on maintaining leader/membership status in zookeeper

2010-04-30 Thread Mahadev Konar
Hi Lei,

In this case, the Leader will be disconnected from ZK cluster and will give
up its leadership. Since its disconnected, ZK cluster will realize that the
Leader is dead!

When Zk cluster realizes that the Leader is dead (this is because the zk
cluster hasn't heard from the Leader for a certain time Configurable via
session timeout parameter), the slaves will be notified of this via watchers
in zookeeper cluster. The slaves will realize that the Leader is gone and
will relect a new Leader and will start working with the new Leader.

Does that answer your question?

You might want to look though the documentation of ZK to understand its use
case and how it solves these kind of issues

Thanks
mahadev


On 4/30/10 2:08 PM, Lei Gao l...@linkedin.com wrote:

 Thank you all for your answers. It clarifies a lot of my confusion about the
 service guarantees of ZK. I am still struggling with one failure case (I am
 not trying to be the pain in the neck. But I need to have a full
 understanding of what ZK can offer before I make a decision on whether to
 used it in my cluster.)
 
 Assume the following topology:
 
  Leader   ZK cluster
   \\//
\\  //
  \\   //
   Slave(s)
 
 If I am asymmetric network failure such that the connection between Leader
 and Slave(s) are broken while all other connections are still alive, would
 my system hang after some point? Because no new leader election will be
 initiated by slaves and the leader can't get the work to slave(s).
 
 Thanks,
 
 Lei
 
 On 4/30/10 1:54 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 If one of your user clients can no longer reach one member of the ZK
 cluster, then it will try to reach another.  If it succeeds, then it will
 continue without any problems as long as the ZK cluster itself is OK.
 
 This applies for all the ZK recipes.  You will have to be a little bit
 careful to handle connection loss, but that should get easier soon (and
 isn't all that difficult anyway).
 
 On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao l...@linkedin.com wrote:
 
 I am not talking about the leader election within zookeeper cluster. I
 guess
 I didn't make the discussion context clear. In my case, I run a cluster
 that
 uses zookeeper for doing the leader election. Yes, nodes in my cluster are
 the clients of zookeeper.  Those nodes depend on zookeeper to elect a new
 leader and figure out what the current leader is. So if the zookeeper
 (think
 of it as a stand-alone entity) becomes unavailabe in the way I've described
 earlier, how can I handle such situation so my cluster can still function
 while a majority of nodes still connect to each other (but not to the
 zookeeper)?
 
 



Re: Question on maintaining leader/membership status in zookeeper

2010-04-30 Thread Lei Gao
Hi Mahadev,

Why would the leader be disconnected from ZK? ZK is fine communicating with
the leader in this case. We are talking about asymmetric network failure.
Yes. Leader could consider all the slaves being down if it tracks the status
of all slaves himself. But I guess if ZK is used for for membership
management, neither the leader nor the slaves will be considered
disconnected because they can all connect to ZK.

Thanks,

Lei  


On 4/30/10 3:47 PM, Mahadev Konar maha...@yahoo-inc.com wrote:

 Hi Lei,
 
 In this case, the Leader will be disconnected from ZK cluster and will give
 up its leadership. Since its disconnected, ZK cluster will realize that the
 Leader is dead!
 
 When Zk cluster realizes that the Leader is dead (this is because the zk
 cluster hasn't heard from the Leader for a certain time Configurable via
 session timeout parameter), the slaves will be notified of this via watchers
 in zookeeper cluster. The slaves will realize that the Leader is gone and
 will relect a new Leader and will start working with the new Leader.
 
 Does that answer your question?
 
 You might want to look though the documentation of ZK to understand its use
 case and how it solves these kind of issues
 
 Thanks
 mahadev
 
 
 On 4/30/10 2:08 PM, Lei Gao l...@linkedin.com wrote:
 
 Thank you all for your answers. It clarifies a lot of my confusion about the
 service guarantees of ZK. I am still struggling with one failure case (I am
 not trying to be the pain in the neck. But I need to have a full
 understanding of what ZK can offer before I make a decision on whether to
 used it in my cluster.)
 
 Assume the following topology:
 
  Leader   ZK cluster
   \\//
\\  //
  \\   //
   Slave(s)
 
 If I am asymmetric network failure such that the connection between Leader
 and Slave(s) are broken while all other connections are still alive, would
 my system hang after some point? Because no new leader election will be
 initiated by slaves and the leader can't get the work to slave(s).
 
 Thanks,
 
 Lei
 
 On 4/30/10 1:54 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 If one of your user clients can no longer reach one member of the ZK
 cluster, then it will try to reach another.  If it succeeds, then it will
 continue without any problems as long as the ZK cluster itself is OK.
 
 This applies for all the ZK recipes.  You will have to be a little bit
 careful to handle connection loss, but that should get easier soon (and
 isn't all that difficult anyway).
 
 On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao l...@linkedin.com wrote:
 
 I am not talking about the leader election within zookeeper cluster. I
 guess
 I didn't make the discussion context clear. In my case, I run a cluster
 that
 uses zookeeper for doing the leader election. Yes, nodes in my cluster are
 the clients of zookeeper.  Those nodes depend on zookeeper to elect a new
 leader and figure out what the current leader is. So if the zookeeper
 (think
 of it as a stand-alone entity) becomes unavailabe in the way I've described
 earlier, how can I handle such situation so my cluster can still function
 while a majority of nodes still connect to each other (but not to the
 zookeeper)?
 
 
 



Re: Question on maintaining leader/membership status in zookeeper

2010-04-30 Thread Mahadev Konar
Hi Lei,
 Sorry I minsinterpreted your question! The scenario you describe could be
handled in such a way -

You could have a status node in ZooKeeper which every slave will subscribe
to and update! If one of the slave nodes sees that there have been too many
connection refused to the Leader by the slaves, the slave could go ahead and
delete the Leader znode, and force the Leader to give up its leadership. I
am not describing a deatiled way to do it, but its not very hard to come up
with a design for this.



Do you intend to have the Leader and Slaves in different Network (different
ACLs I mean) protected zones? In that case, it is a legitimate concern else
I do think assymetric network partition would be very unlikely to happen.

Do you usually see network partitions in such scenarios?

Thanks
mahadev


On 4/30/10 4:05 PM, Lei Gao l...@linkedin.com wrote:

 Hi Mahadev,
 
 Why would the leader be disconnected from ZK? ZK is fine communicating with
 the leader in this case. We are talking about asymmetric network failure.
 Yes. Leader could consider all the slaves being down if it tracks the status
 of all slaves himself. But I guess if ZK is used for for membership
 management, neither the leader nor the slaves will be considered
 disconnected because they can all connect to ZK.
 
 Thanks,
 
 Lei  
 
 
 On 4/30/10 3:47 PM, Mahadev Konar maha...@yahoo-inc.com wrote:
 
 Hi Lei,
 
 In this case, the Leader will be disconnected from ZK cluster and will give
 up its leadership. Since its disconnected, ZK cluster will realize that the
 Leader is dead!
 
 When Zk cluster realizes that the Leader is dead (this is because the zk
 cluster hasn't heard from the Leader for a certain time Configurable via
 session timeout parameter), the slaves will be notified of this via watchers
 in zookeeper cluster. The slaves will realize that the Leader is gone and
 will relect a new Leader and will start working with the new Leader.
 
 Does that answer your question?
 
 You might want to look though the documentation of ZK to understand its use
 case and how it solves these kind of issues
 
 Thanks
 mahadev
 
 
 On 4/30/10 2:08 PM, Lei Gao l...@linkedin.com wrote:
 
 Thank you all for your answers. It clarifies a lot of my confusion about the
 service guarantees of ZK. I am still struggling with one failure case (I am
 not trying to be the pain in the neck. But I need to have a full
 understanding of what ZK can offer before I make a decision on whether to
 used it in my cluster.)
 
 Assume the following topology:
 
  Leader   ZK cluster
   \\//
\\  //
  \\   //
   Slave(s)
 
 If I am asymmetric network failure such that the connection between Leader
 and Slave(s) are broken while all other connections are still alive, would
 my system hang after some point? Because no new leader election will be
 initiated by slaves and the leader can't get the work to slave(s).
 
 Thanks,
 
 Lei
 
 On 4/30/10 1:54 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 If one of your user clients can no longer reach one member of the ZK
 cluster, then it will try to reach another.  If it succeeds, then it will
 continue without any problems as long as the ZK cluster itself is OK.
 
 This applies for all the ZK recipes.  You will have to be a little bit
 careful to handle connection loss, but that should get easier soon (and
 isn't all that difficult anyway).
 
 On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao l...@linkedin.com wrote:
 
 I am not talking about the leader election within zookeeper cluster. I
 guess
 I didn't make the discussion context clear. In my case, I run a cluster
 that
 uses zookeeper for doing the leader election. Yes, nodes in my cluster are
 the clients of zookeeper.  Those nodes depend on zookeeper to elect a new
 leader and figure out what the current leader is. So if the zookeeper
 (think
 of it as a stand-alone entity) becomes unavailabe in the way I've
 described
 earlier, how can I handle such situation so my cluster can still function
 while a majority of nodes still connect to each other (but not to the
 zookeeper)?
 
 
 
 



Re: Question on maintaining leader/membership status in zookeeper

2010-04-30 Thread Ted Dunning
Lei,

I think that Mahadev was correct that there is some confusion here.

Leader election is normally a term used for an operation that is entirely
internal to ZK.  It is very robust and you probably don't need to worry
about it.

You can then use ZK in your application to pick a lead machine for other
operations.  In that case, essentially every failure scenario is handled by
the standard recipe.  In your example where the master and slave are cut
off, but both still have access to ZK, all that will happen is that the
master cannot communicate with the slave.  Both will still be clear about
who is in which role.

The case where the master is cut off from both ZK and the slave is also
handled well as is the case where the master is cut off from ZK, but not
from the slave.  In both cases, the master will get a connection loss event
and stop trying to act like a master and the slave will be notified that the
master has dropped out of its role.

On Fri, Apr 30, 2010 at 4:05 PM, Lei Gao l...@linkedin.com wrote:

 Hi Mahadev,

 Why would the leader be disconnected from ZK? ZK is fine communicating with
 the leader in this case. We are talking about asymmetric network failure.
 Yes. Leader could consider all the slaves being down if it tracks the
 status
 of all slaves himself. But I guess if ZK is used for for membership
 management, neither the leader nor the slaves will be considered
 disconnected because they can all connect to ZK.

 Thanks,

 Lei


 On 4/30/10 3:47 PM, Mahadev Konar maha...@yahoo-inc.com wrote:

  Hi Lei,
 
  In this case, the Leader will be disconnected from ZK cluster and will
 give
  up its leadership. Since its disconnected, ZK cluster will realize that
 the
  Leader is dead!
 
  When Zk cluster realizes that the Leader is dead (this is because the zk
  cluster hasn't heard from the Leader for a certain time Configurable
 via
  session timeout parameter), the slaves will be notified of this via
 watchers
  in zookeeper cluster. The slaves will realize that the Leader is gone and
  will relect a new Leader and will start working with the new Leader.
 
  Does that answer your question?
 
  You might want to look though the documentation of ZK to understand its
 use
  case and how it solves these kind of issues
 
  Thanks
  mahadev
 
 
  On 4/30/10 2:08 PM, Lei Gao l...@linkedin.com wrote:
 
  Thank you all for your answers. It clarifies a lot of my confusion about
 the
  service guarantees of ZK. I am still struggling with one failure case (I
 am
  not trying to be the pain in the neck. But I need to have a full
  understanding of what ZK can offer before I make a decision on whether
 to
  used it in my cluster.)
 
  Assume the following topology:
 
   Leader   ZK cluster
\\//
 \\  //
   \\   //
Slave(s)
 
  If I am asymmetric network failure such that the connection between
 Leader
  and Slave(s) are broken while all other connections are still alive,
 would
  my system hang after some point? Because no new leader election will be
  initiated by slaves and the leader can't get the work to slave(s).
 
  Thanks,
 
  Lei
 
  On 4/30/10 1:54 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
  If one of your user clients can no longer reach one member of the ZK
  cluster, then it will try to reach another.  If it succeeds, then it
 will
  continue without any problems as long as the ZK cluster itself is OK.
 
  This applies for all the ZK recipes.  You will have to be a little bit
  careful to handle connection loss, but that should get easier soon (and
  isn't all that difficult anyway).
 
  On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao l...@linkedin.com wrote:
 
  I am not talking about the leader election within zookeeper cluster. I
  guess
  I didn't make the discussion context clear. In my case, I run a
 cluster
  that
  uses zookeeper for doing the leader election. Yes, nodes in my cluster
 are
  the clients of zookeeper.  Those nodes depend on zookeeper to elect a
 new
  leader and figure out what the current leader is. So if the zookeeper
  (think
  of it as a stand-alone entity) becomes unavailabe in the way I've
 described
  earlier, how can I handle such situation so my cluster can still
 function
  while a majority of nodes still connect to each other (but not to the
  zookeeper)?
 
 
 




Re: Question on maintaining leader/membership status in zookeeper

2010-04-30 Thread Lei Gao
Hi Mahadev,

First of all, I like to thank you for being patient with me - my questions
seem unclear to many of you who try to help me.

I guess clients have to be smart enough to trigger a new leader election by
trying to delete the znode. But in this case, ZK should not allow any single
or multiple (as long as they are less than a quorum) client(s) to delete the
znode responding to the master, right? A new consensus among clients (NOT
among the nodes in zk cluster) has to be there for the znode to be deleted,
right?  Does zk have this capability or the clients have to come to this
consensus outside of zk before trying to delete the znode in zk?

Thanks,

Lei

 Hi Lei,
  Sorry I minsinterpreted your question! The scenario you describe could be
 handled in such a way -
 
 You could have a status node in ZooKeeper which every slave will subscribe
 to and update! If one of the slave nodes sees that there have been too many
 connection refused to the Leader by the slaves, the slave could go ahead and
 delete the Leader znode, and force the Leader to give up its leadership. I
 am not describing a deatiled way to do it, but its not very hard to come up
 with a design for this.
 
 
 
 Do you intend to have the Leader and Slaves in different Network (different
 ACLs I mean) protected zones? In that case, it is a legitimate concern else
 I do think assymetric network partition would be very unlikely to happen.
 
 Do you usually see network partitions in such scenarios?
 
 Thanks
 mahadev
 
 
 On 4/30/10 4:05 PM, Lei Gao l...@linkedin.com wrote:
 
 Hi Mahadev,
 
 Why would the leader be disconnected from ZK? ZK is fine communicating with
 the leader in this case. We are talking about asymmetric network failure.
 Yes. Leader could consider all the slaves being down if it tracks the status
 of all slaves himself. But I guess if ZK is used for for membership
 management, neither the leader nor the slaves will be considered
 disconnected because they can all connect to ZK.
 
 Thanks,
 
 Lei  
 
 
 On 4/30/10 3:47 PM, Mahadev Konar maha...@yahoo-inc.com wrote:
 
 Hi Lei,
 
 In this case, the Leader will be disconnected from ZK cluster and will give
 up its leadership. Since its disconnected, ZK cluster will realize that the
 Leader is dead!
 
 When Zk cluster realizes that the Leader is dead (this is because the zk
 cluster hasn't heard from the Leader for a certain time Configurable via
 session timeout parameter), the slaves will be notified of this via watchers
 in zookeeper cluster. The slaves will realize that the Leader is gone and
 will relect a new Leader and will start working with the new Leader.
 
 Does that answer your question?
 
 You might want to look though the documentation of ZK to understand its use
 case and how it solves these kind of issues
 
 Thanks
 mahadev
 
 
 On 4/30/10 2:08 PM, Lei Gao l...@linkedin.com wrote:
 
 Thank you all for your answers. It clarifies a lot of my confusion about
 the
 service guarantees of ZK. I am still struggling with one failure case (I am
 not trying to be the pain in the neck. But I need to have a full
 understanding of what ZK can offer before I make a decision on whether to
 used it in my cluster.)
 
 Assume the following topology:
 
  Leader   ZK cluster
   \\//
\\  //
  \\   //
   Slave(s)
 
 If I am asymmetric network failure such that the connection between Leader
 and Slave(s) are broken while all other connections are still alive, would
 my system hang after some point? Because no new leader election will be
 initiated by slaves and the leader can't get the work to slave(s).
 
 Thanks,
 
 Lei
 
 On 4/30/10 1:54 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 If one of your user clients can no longer reach one member of the ZK
 cluster, then it will try to reach another.  If it succeeds, then it will
 continue without any problems as long as the ZK cluster itself is OK.
 
 This applies for all the ZK recipes.  You will have to be a little bit
 careful to handle connection loss, but that should get easier soon (and
 isn't all that difficult anyway).
 
 On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao l...@linkedin.com wrote:
 
 I am not talking about the leader election within zookeeper cluster. I
 guess
 I didn't make the discussion context clear. In my case, I run a cluster
 that
 uses zookeeper for doing the leader election. Yes, nodes in my cluster
 are
 the clients of zookeeper.  Those nodes depend on zookeeper to elect a new
 leader and figure out what the current leader is. So if the zookeeper
 (think
 of it as a stand-alone entity) becomes unavailabe in the way I've
 described
 earlier, how can I handle such situation so my cluster can still function
 while a majority of nodes still connect to each other (but not to the
 zookeeper)?
 
 
 
 
 



Re: Question on maintaining leader/membership status in zookeeper

2010-04-30 Thread Patrick Hunt
I believe Lei's concern is that the leader and all slaves can talk to 
ZK, but the slaves cannot talk to the leader. As a result no work can be 
done. However nothing will happen on the ZK side since everyone is 
heartbeating properly.


Mahadev I think you came up with a pretty good solution. However since 
the leader can see the votes from all the slaves it might just want to 
give up the lead itself and pause for a while (to give someone else 
the chance to be the leader). This would allow the leader to handle the 
case better where a single slave cannot talk to the leader, but the rest 
of the slaves can communicate fine.


Patrick

On 04/30/2010 04:31 PM, Mahadev Konar wrote:

Maybe I jumped the gun here but Ted's response to your query is more
appropriate -



You can then use ZK in your application to pick a lead machine for other
operations.  In that case, essentially every failure scenario is handled by
the standard recipe.  In your example where the master and slave are cut
off, but both still have access to ZK, all that will happen is that the
master cannot communicate with the slave.  Both will still be clear about
who is in which role.

The case where the master is cut off from both ZK and the slave is also
handled well as is the case where the master is cut off from ZK, but not
from the slave.  In both cases, the master will get a connection loss event
and stop trying to act like a master and the slave will be notified that the
master has dropped out of its role.

--





On 4/30/10 4:14 PM, Mahadev Konarmaha...@yahoo-inc.com  wrote:


Hi Lei,
  Sorry I minsinterpreted your question! The scenario you describe could be
handled in such a way -

You could have a status node in ZooKeeper which every slave will subscribe
to and update! If one of the slave nodes sees that there have been too many
connection refused to the Leader by the slaves, the slave could go ahead and
delete the Leader znode, and force the Leader to give up its leadership. I
am not describing a deatiled way to do it, but its not very hard to come up
with a design for this.



Do you intend to have the Leader and Slaves in different Network (different
ACLs I mean) protected zones? In that case, it is a legitimate concern else
I do think assymetric network partition would be very unlikely to happen.

Do you usually see network partitions in such scenarios?

Thanks
mahadev


On 4/30/10 4:05 PM, Lei Gaol...@linkedin.com  wrote:


Hi Mahadev,

Why would the leader be disconnected from ZK? ZK is fine communicating with
the leader in this case. We are talking about asymmetric network failure.
Yes. Leader could consider all the slaves being down if it tracks the status
of all slaves himself. But I guess if ZK is used for for membership
management, neither the leader nor the slaves will be considered
disconnected because they can all connect to ZK.

Thanks,

Lei


On 4/30/10 3:47 PM, Mahadev Konarmaha...@yahoo-inc.com  wrote:


Hi Lei,

In this case, the Leader will be disconnected from ZK cluster and will give
up its leadership. Since its disconnected, ZK cluster will realize that the
Leader is dead!

When Zk cluster realizes that the Leader is dead (this is because the zk
cluster hasn't heard from the Leader for a certain time Configurable via
session timeout parameter), the slaves will be notified of this via watchers
in zookeeper cluster. The slaves will realize that the Leader is gone and
will relect a new Leader and will start working with the new Leader.

Does that answer your question?

You might want to look though the documentation of ZK to understand its use
case and how it solves these kind of issues

Thanks
mahadev


On 4/30/10 2:08 PM, Lei Gaol...@linkedin.com  wrote:


Thank you all for your answers. It clarifies a lot of my confusion about
the
service guarantees of ZK. I am still struggling with one failure case (I am
not trying to be the pain in the neck. But I need to have a full
understanding of what ZK can offer before I make a decision on whether to
used it in my cluster.)

Assume the following topology:

  Leader   ZK cluster
   \\//
\\  //
  \\   //
   Slave(s)

If I am asymmetric network failure such that the connection between Leader
and Slave(s) are broken while all other connections are still alive, would
my system hang after some point? Because no new leader election will be
initiated by slaves and the leader can't get the work to slave(s).

Thanks,

Lei

On 4/30/10 1:54 PM, Ted Dunningted.dunn...@gmail.com  wrote:


If one of your user clients can no longer reach one member of the ZK
cluster, then it will try to reach another.  If it succeeds, then it will
continue without any problems as long as the ZK cluster itself is OK.

This applies for all the ZK recipes.  You will have to be a little bit
careful to handle connection loss, but that 

Re: Question on maintaining leader/membership status in zookeeper

2010-04-30 Thread Mahadev Konar
HI Lei,
  ZooKeeper provides a set of primitives which allows you to do all kinds of
things! You might want to take a look at the api and some examples of
zookeeper recipes to see how it works and probably that will clear things
out for you.

Here are the links:

http://hadoop.apache.org/zookeeper/docs/r3.3.0/recipes.html

Thanks
mahadev


On 4/30/10 4:46 PM, Lei Gao l...@linkedin.com wrote:

 Hi Mahadev,
 
 First of all, I like to thank you for being patient with me - my questions
 seem unclear to many of you who try to help me.
 
 I guess clients have to be smart enough to trigger a new leader election by
 trying to delete the znode. But in this case, ZK should not allow any single
 or multiple (as long as they are less than a quorum) client(s) to delete the
 znode responding to the master, right? A new consensus among clients (NOT
 among the nodes in zk cluster) has to be there for the znode to be deleted,
 right?  Does zk have this capability or the clients have to come to this
 consensus outside of zk before trying to delete the znode in zk?
 
 Thanks,
 
 Lei
 
 Hi Lei,
  Sorry I minsinterpreted your question! The scenario you describe could be
 handled in such a way -
 
 You could have a status node in ZooKeeper which every slave will subscribe
 to and update! If one of the slave nodes sees that there have been too many
 connection refused to the Leader by the slaves, the slave could go ahead and
 delete the Leader znode, and force the Leader to give up its leadership. I
 am not describing a deatiled way to do it, but its not very hard to come up
 with a design for this.
 
 
 
 Do you intend to have the Leader and Slaves in different Network (different
 ACLs I mean) protected zones? In that case, it is a legitimate concern else
 I do think assymetric network partition would be very unlikely to happen.
 
 Do you usually see network partitions in such scenarios?
 
 Thanks
 mahadev
 
 
 On 4/30/10 4:05 PM, Lei Gao l...@linkedin.com wrote:
 
 Hi Mahadev,
 
 Why would the leader be disconnected from ZK? ZK is fine communicating with
 the leader in this case. We are talking about asymmetric network failure.
 Yes. Leader could consider all the slaves being down if it tracks the status
 of all slaves himself. But I guess if ZK is used for for membership
 management, neither the leader nor the slaves will be considered
 disconnected because they can all connect to ZK.
 
 Thanks,
 
 Lei  
 
 
 On 4/30/10 3:47 PM, Mahadev Konar maha...@yahoo-inc.com wrote:
 
 Hi Lei,
 
 In this case, the Leader will be disconnected from ZK cluster and will give
 up its leadership. Since its disconnected, ZK cluster will realize that the
 Leader is dead!
 
 When Zk cluster realizes that the Leader is dead (this is because the zk
 cluster hasn't heard from the Leader for a certain time Configurable
 via
 session timeout parameter), the slaves will be notified of this via
 watchers
 in zookeeper cluster. The slaves will realize that the Leader is gone and
 will relect a new Leader and will start working with the new Leader.
 
 Does that answer your question?
 
 You might want to look though the documentation of ZK to understand its use
 case and how it solves these kind of issues
 
 Thanks
 mahadev
 
 
 On 4/30/10 2:08 PM, Lei Gao l...@linkedin.com wrote:
 
 Thank you all for your answers. It clarifies a lot of my confusion about
 the
 service guarantees of ZK. I am still struggling with one failure case (I
 am
 not trying to be the pain in the neck. But I need to have a full
 understanding of what ZK can offer before I make a decision on whether to
 used it in my cluster.)
 
 Assume the following topology:
 
  Leader   ZK cluster
   \\//
\\  //
  \\   //
   Slave(s)
 
 If I am asymmetric network failure such that the connection between Leader
 and Slave(s) are broken while all other connections are still alive, would
 my system hang after some point? Because no new leader election will be
 initiated by slaves and the leader can't get the work to slave(s).
 
 Thanks,
 
 Lei
 
 On 4/30/10 1:54 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 If one of your user clients can no longer reach one member of the ZK
 cluster, then it will try to reach another.  If it succeeds, then it will
 continue without any problems as long as the ZK cluster itself is OK.
 
 This applies for all the ZK recipes.  You will have to be a little bit
 careful to handle connection loss, but that should get easier soon (and
 isn't all that difficult anyway).
 
 On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao l...@linkedin.com wrote:
 
 I am not talking about the leader election within zookeeper cluster. I
 guess
 I didn't make the discussion context clear. In my case, I run a cluster
 that
 uses zookeeper for doing the leader election. Yes, nodes in my cluster
 are
 the clients of zookeeper.  Those nodes depend on zookeeper to elect