Re: Unable to restart ZK

2012-03-08 Thread Shelley, Ryan
Well, that was obvious. I should have looked there first. Back up and running now. Thanks! -Ryan On 3/8/12 4:23 PM, "Ted Dunning" wrote: >Ahh... I see. > >Look in the zoo.cfg file that your ZK is using for a dataDir setting. >That >will tell you where ZK has its data including snapshots and log

Re: Rolling upgrades

2012-03-08 Thread Ted Dunning
Any even number that is greater than half of the configured number of nodes is fine. The only *really* bad even number of servers is 0. On Thu, Mar 8, 2012 at 4:24 PM, Ted Dunning wrote: > It won't be any different than a temporary state when one of 3 or 5 nodes > is down. > > > On Thu, Mar 8,

Re: Rolling upgrades

2012-03-08 Thread Ted Dunning
It won't be any different than a temporary state when one of 3 or 5 nodes is down. On Thu, Mar 8, 2012 at 4:10 PM, Jordan Zimmerman wrote: > AlsoŠ > > I thought that ZK ensembles need to be odd in number. How would ZK handle > a temporary state where there is an even number? > > -JZ > > On 3/8/12

Re: Unable to restart ZK

2012-03-08 Thread Ted Dunning
Ahh... I see. Look in the zoo.cfg file that your ZK is using for a dataDir setting. That will tell you where ZK has its data including snapshots and logs. On Thu, Mar 8, 2012 at 4:10 PM, Shelley, Ryan wrote: > It attempts to start, gets an exception and then quits: > > http://screencast.com/t/T

RE: Rolling upgrades

2012-03-08 Thread Alexander Shraer
I don't think they must be an odd number, but it does make sense to have an odd number because with majority quorums the fault tolerance you get with 3 servers is the same as the fault tolerance you get with 4 - in both cases you can only tolerate 1 failure. Alex > -Original Message- >

Re: Unable to restart ZK

2012-03-08 Thread Shelley, Ryan
It attempts to start, gets an exception and then quits: http://screencast.com/t/TWG529FGV0R Finagle added the paths to ZK, now when I try to restart, it looks like ZK is trying to replay some transaction, failing, and quitting. If I knew where the data file was, I could just delete the data file

Re: Rolling upgrades

2012-03-08 Thread Jordan Zimmerman
AlsoŠ I thought that ZK ensembles need to be odd in number. How would ZK handle a temporary state where there is an even number? -JZ On 3/8/12 3:39 PM, "Alexander Shraer" wrote: >I don't think there is a problem if you do it as you say, or even if you >just change the config files of all serve

Re: Unable to restart ZK

2012-03-08 Thread Ted Dunning
Can you resolve the pronouns here? It looks like ZK is running, but that Finagle will not. Is that what you meant to say? The log messages make it look like somebody assumes that a directory /twitter/servers does exist, but is finding that this directory doesn't exist. Isn't that an application

Unable to restart ZK

2012-03-08 Thread Shelley, Ryan
I was playing with Finagle which uses ZK, however, after shutting down my ZK instance, it won't start back up again: 2012-03-08 15:44:39,427 [myid:] - INFO [main:ZooKeeperServer@733] - tickTime set to 2000 2012-03-08 15:44:39,427 [myid:] - INFO [main:ZooKeeperServer@742] - minSessionTimeout s

RE: Rolling upgrades

2012-03-08 Thread Alexander Shraer
I don't think there is a problem if you do it as you say, or even if you just change the config files of all servers at once and restart them, because a majority of the new config necessarily intersects with a majority of the old one, so a server who has the latest state will be elected leader.

Rolling upgrades

2012-03-08 Thread Jordan Zimmerman
I've been reading the archives regarding rolling upgrades. Here's the scenario, given a stable ensemble: ZK1 <-> ZK2 <-> ZK3 In the above, the zoo.cfg for each server looks like this (pseudo): server.1=ZK1 server.2=ZK2 server.3=ZK3 I want to add a new server, ZK4. If I understand this correctly

Re: Possibility / consequences of having multiple elected leaders

2012-03-08 Thread Ted Dunning
Exactly. On Thu, Mar 8, 2012 at 3:09 PM, Alexander Shraer wrote: > Thanks Ted, I can see your point. We use TCP connections and we do the > epoch check at the beginning of the protocol, so > > a message from an old leader cannot just resurface. > > ** ** > > Alex > > ** ** > > *From

RE: Possibility / consequences of having multiple elected leaders

2012-03-08 Thread Alexander Shraer
Thanks Ted, I can see your point. We use TCP connections and we do the epoch check at the beginning of the protocol, so a message from an old leader cannot just resurface. Alex From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Thursday, March 08, 2012 12:32 AM To: Alexander Shraer Cc: user@

Re: Possibility / consequences of having multiple elected leaders

2012-03-08 Thread Ted Dunning
The whole point of the zab protocol is to ensure that only one elected leader can exist at one time. Since a quorum has to commit to supporting any leader there can't be two leaders. Furthermore each change of leadership increments the epoch and that increment had to be committed on a majority