Everything is here http://people.apache.org/~jdcryans/zk_election_bug.tar.gz
The server we are trying to start is sv4borg222 (myid is 2) and we
started it around 10:03:21
Thx!
J-D
On Mon, Jan 25, 2010 at 10:49 AM, Patrick Hunt ph...@apache.org wrote:
1) Capture the logs from all 5 servers
2)
According to the log for 222 it can't open a connection to the election
port (3888) for any of the other servers. This seems very unusual. Can
you verify that ther's connectivity on that port btw 222 and all the
other servers?
Also, can you re-run the netstat with -a option? We can see the
According to the log for 222 it can't open a connection to the election port
(3888) for any of the other servers. This seems very unusual. Can you verify
that ther's connectivity on that port btw 222 and all the other servers?
jdcry...@sv4borg222:~$ telnet sv4borg224 3888
Trying
JD, there's something _very_ unusual in your setup. Are you running
official released ZooKeeper code or something else?
Either there is a misconfiguration on the other servers (the configs for
the other servers is exactly the same as 222 right?), or perhaps some
patches to ZK codebase that
: Killing a zookeeper server 12 was just to
keep uniformity on our servers. Our clients are
connecting from the same 12 servers. Easily modifiable
and perhaps we should look into changing that. The logs
just seem to indicate that the servers that claim to
have no server running are continually attempting
-Original
Message- From: Nick Bailey
nicholas.bai...@rackspace.com Sent: Tuesday, January
12, 2010 6:03pm To: zookeeper-user@hadoop.apache.org Subject:
Re: Killing a zookeeper server 12 was just to
keep uniformity on our servers. Our clients are
connecting from the same 12 servers. Easily
Hi Nick, Your assessment sounds correct, the issue seems to be caused
by the bug described in ZOOKEEPER-427. Can't you upgrade to a newer
release? Killing the leader should do it, but the bug will still be
there, so I recommend upgrading.
Thanks,
-Flavio
On Jan 12, 2010, at 10:52 PM, Nick
@hadoop.apache.org, nicholas.bai...@rackspace.com
Subject: Re: Killing a zookeeper server
12 servers? That's alot, if you dont' mind my asking why so many?
Typically we recommend 5 - that way you can have one down for
maintenance and still have a failure that doesn't bring down the
cluster.
The electing
Subject: Re: Killing a zookeeper server
12 was just to keep uniformity on our servers. Our clients are connecting
from the same 12 servers. Easily modifiable and perhaps we should look into
changing that.
The logs just seem to indicate that the servers that claim to have no
server running
amount of data really and network latency appears fine.
Thanks for the help,
Nick
-Original Message-
From: Nick Bailey nicholas.bai...@rackspace.com
Sent: Tuesday, January 12, 2010 6:03pm
To: zookeeper-user@hadoop.apache.org
Subject: Re: Killing a zookeeper server
12 was just
nicholas.bai...@rackspace.com
Sent: Tuesday, January 12, 2010 6:03pm
To: zookeeper-user@hadoop.apache.org
Subject: Re: Killing a zookeeper server
12 was just to keep uniformity on our servers. Our clients are connecting
from the same 12 servers. Easily modifiable and perhaps we should look
a large
amount of data really and network latency appears fine.
Thanks for the help,
Nick
-Original Message-
From: Nick Bailey nicholas.bai...@rackspace.com
Sent: Tuesday, January 12, 2010 6:03pm
To: zookeeper-user@hadoop.apache.org
Subject: Re: Killing a zookeeper server
12 was just
We are running zookeeper 3.1.0
Recently we noticed the cpu usage on our machines becoming increasingly high
and we believe the cause is
https://issues.apache.org/jira/browse/ZOOKEEPER-427
However our solution when we noticed the problem was to kill the zookeeper
process and restart it.
12 servers? That's alot, if you dont' mind my asking why so many?
Typically we recommend 5 - that way you can have one down for
maintenance and still have a failure that doesn't bring down the cluster.
The electing a leader is probably the restarted machine attempting to
re-join the ensemble
I have a related question: what's the behavior of a cluster of 3 when
one is down? I've tried it and a leader is elected, but are there any
other caveats for this situation?
.. Adam
On Tue, Jan 12, 2010 at 2:40 PM, Patrick Hunt ph...@apache.org wrote:
12 servers? That's alot, if you dont' mind
@hadoop.apache.org
Subject: Re: Killing a zookeeper server
12 was just to keep uniformity on our servers. Our clients are connecting from
the same 12 servers. Easily modifiable and perhaps we should look into
changing that.
The logs just seem to indicate that the servers that claim to have no server
Doh - that makes total sense. For whatever reason I thought with 2
servers you couldn't get a majority :P
On Tue, Jan 12, 2010 at 3:17 PM, Henry Robinson he...@cloudera.com wrote:
Hi Adam -
As long as a quorum of servers is running, ZK will be live. With majority
quorums, 2/3 is enough to
-Original Message-
From: Nick Bailey nicholas.bai...@rackspace.com
Sent: Tuesday, January 12, 2010 6:03pm
To: zookeeper-user@hadoop.apache.org
Subject: Re: Killing a zookeeper server
12 was just to keep uniformity on our servers. Our clients are connecting from
the same 12 servers
18 matches
Mail list logo