You should never see connection loss except in the case where you have some network partition or some other issue that causes communication issues btw the client and server. (client swapping? server swapping or either having GC pause issues? etc...) Are you monitoring your hosts/network/jvms, etc..? "over virtualization" of the cluster hosts?

Take a look at your client/server logs and see if you can determine what the issue is. You might also try using some network level tools like ping/ssh to verify connectivity btw server/client. See this page for issues ppl have had in the past:
http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting
For example "Hardware misconfiguration - NIC" caused one system to basically work, but with huge numbers of connection loss, esp whenever there was load (and I've seen this particular issue twice now).

See

Patrick

Michael Bauland wrote:
Hi Ted,

thanks for your reply.

This page: about Zookeeper error
handling<http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling>may
help.

I actually read this page before. You may have misunderstood my
question. I know how to recover from the connectionloss exception. I was
just curious why it occurred so often in my described scenario. I would
have assumed that in that scenario it shouldn't occur at all, but it was
almost half of the requests that returned with a connectionloss.

Cheers,

Michael


On Mon, Feb 1, 2010 at 4:30 AM, Michael Bauland <michael.baul...@knipp.de>wrote:

Hello,

I've got a question regarding the connectionloss exception thrown by Java.
I've got an ensemble running with three zk servers. If one of the three
servers is not running, the whole ensemble should still work (and it
does, so that's fine). But in this situation I experience quite often a
connectionloss exception and I'm wondering if I'm doing something wrong
or if that's to be expected.

My Code is rather simple:
I create a new connection to my ensemble using

ZooKeeper zk = new ZooKeeper (connectString, timeOut, new MyWatcher ());

where connectString contains all three servers. Then I use the ZooKeeper
to read data from a certain path:

zk.getData (path, false, null);

This call quite often returns an exception like

org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /125/170/test

But according to your documentation, the connectionloss exception should
only occur in the following two cases:

   1. The application calls an operation on a session that is no longer
alive/valid

This should not be the case, since I only just created the session.

   2. The ZooKeeper client disconnects from a server when there are
pending operations to that server, i.e., there is a pending asynchronous
call.

The should also not be the case. I was just doing a read request and no
other client was accessing the ensemble.


My only idea is that maybe the connection call first tried to connect to
 the zookeeper server that was not running (remember only two of the
three servers are running) and before it had a chance to try to connect
to one of the other servers, my getData call was made and failed with
connectionloss. Could that be the reason?
But I thought the connection handling was automatic and if a connection
failed the client would automatically try any of the other listed
servers without the user noticing!?

Thanks for any help.

Cheers,

Michael


--
Michael Bauland
michael.baul...@knipp.de
bauland.tel





--
Michael Bauland
michael.baul...@knipp.de
bauland.tel

Reply via email to