Hi Qian, it would good if you could create a jira for this:
https://issues.apache.org/jira/browse/ZOOKEEPER

include both the client logs and the server logs (for overlapping client/server time period where you see the problem). also the server config if you're using a quorum vs standalone. If you could also include some/all of the client side code you have that would be useful for us to review.

are you doing anything in particular during this time period where you see a problem? perhaps load testing of the server?

2 sec timeout is pretty low. the client needs to send a heartbeat every 0.6 seconds (1/3 timeout), and if it doesn't hear back from the server in another 0.6 seconds it will close the connection and attempt to re-connect to a(nother) server. also if the server doesn't hear from the client every 2 seconds it will expire the session. in general this is ok, but if you have any network issues, or periodic heavy load on the server this could cause a problem. typically we suggest timeouts in the 20-30 second range, but we do have some ppl that run in 5-10 second range regularly (but they have solid lans with low latency connections)

re the ephemeral node - did you try the suggestions I had in my previous email? (attached) (please include that on the jira as well)

Patrick

Qian Ye wrote:
Hi Ben:

I used multi-thread library, and the session timeout is set to 2000 when the
zookeeper handler was initialized.

On Thu, Aug 20, 2009 at 9:52 PM, Benjamin Reed <br...@yahoo-inc.com> wrote:

are you using the single threaded or multithreaded C library? the exceeded
deadline message means that our thread was supposed to get control after a
certain period, but we got control that many milliseconds late. what is your
session timeout?

ben

________________________________________
From: Qian Ye [yeqian....@gmail.com]
Sent: Thursday, August 20, 2009 3:17 AM
To: zookeeper-user
Subject: A question about "Connection timed out" and "operation timeout"

Hi guys:

I met the problem again: an ephemeral node disappeared, and I found it
because my application got a "operation timeout"

My application which created an ephemeral node at the zookeeper server,
printed the following log

*WARNING: 08-20 03:09:20:  auto * 182894118176
[logid:][reqip:][auto_exchanger_zk_basic.cpp:605]get children
fail.[/forum/elect_nodes][-7][operation timeout]*

and the Zookeeper client printed the following log (the log level is INFO)

2009-08-19 21:36:18,067:3813(0x9556c520):zoo_i...@log_env@545: Client
environment:zookeeper.version=zookeeper C client 3.2.0
   606 2009-08-19 21:36:18,067:3813(0x9556c520):zoo_i...@log_env@549:
Client environment:host.name=jx-ziyuan-test00.jx.baidu.com
   607 2009-08-19 21:36:18,068:3813(0x9556c520):zoo_i...@log_env@557:
Client environments.name=Linux
   608 2009-08-19 21:36:18,068:3813(0x9556c520):zoo_i...@log_env@558:
Client environments.arch=2.6.9-52bs
   609 2009-08-19 21:36:18,068:3813(0x9556c520):zoo_i...@log_env@559:
Client environments.version=#2 SMP Fri Jan 26 13:34:38 CST 2007
   610 2009-08-19 21:36:18,068:3813(0x9556c520):zoo_i...@log_env@567:
Client environment:user.name=club
   611 2009-08-19 21:36:18,068:3813(0x9556c520):zoo_i...@log_env@577:
Client environment:user.home=/home/club
   612 2009-08-19 21:36:18,068:3813(0x9556c520):zoo_i...@log_env@589:
Client environment:user.dir=/home/club/user/luhongbo/auto-exchanger
   613 2009-08-19 21:36:18,068:3813(0x9556c520):zoo_i...@zookeeper_init
@613:
Initiating client connection,
host=127.0.0.1:2181,127.0.0.1:2182sessionTimeout=2000 wa
tcher=0x408c56 sessionId=0x0
sessionPasswd=<null> context=(nil) flags=0
   614 2009-08-19 21:36:18,069:3813(0x41401960):zoo_i...@check_events
@1439:
initiated connection to server [127.0.0.1:2181]
   615 2009-08-19 21:36:18,070:3813(0x41401960):zoo_i...@check_events
@1484:
connected to server [127.0.0.1:2181] with session id=1232c1688a20093
   616 2009-08-20
02:48:01,780:3813(0x41401960):zoo_w...@zookeeper_interest@1335:
Exceeded deadline by 520ms
   617 2009-08-20
03:08:52,332:3813(0x41401960):zoo_w...@zookeeper_interest@1335:
Exceeded deadline by 14ms
   618 2009-08-20
03:09:04,666:3813(0x41401960):zoo_w...@zookeeper_interest@1335:
Exceeded deadline by 48ms
   619 2009-08-20
03:09:09,733:3813(0x41401960):zoo_w...@zookeeper_interest@1335:
Exceeded deadline by 24ms
   620 *2009-08-20
03:09:20,289:3813(0x41401960):zoo_w...@zookeeper_interest@1335: Exceeded
deadline by 264ms*
   621 *2009-08-20
03:09:20,295:3813(0x41401960):zoo_er...@handle_socket_error_msg@1388:
Socket
[127.0.0.1:2181] zk retcode=-7, errno=110(Connection timed out): conn
ection timed out (exceeded timeout by 264ms)*
   622 *2009-08-20
03:09:20,309:3813(0x41401960):zoo_w...@zookeeper_interest@1335: Exceeded
deadline by 284ms*
   623 *2009-08-20
03:09:20,309:3813(0x41401960):zoo_er...@handle_socket_error_msg@1433:
Socket
[127.0.0.1:2182] zk retcode=-4, errno=111(Connection refused):
server         refused to accept the client*
   624 *2009-08-20 03:09:20,353:3813(0x41401960):zoo_i...@check_events
@1439:
initiated connection to server [127.0.0.1:2181]*
   625 *2009-08-20 03:09:20,552:3813(0x41401960):zoo_i...@check_events
@1484:
connected to server [127.0.0.1:2181] with session id=1232c1688a20093*

I don't know why the "connection timed out" happened at *2009-08-20
03:09:20,295:3813, *and the server refuse to accept the client. Could some
one give me any hints? And I'm not sure the meaning of "Exceeded deadline
by
xxms", need some help too.


P.S. I used the Zookeeper 3.2.0 (Server and C Client API) and run a
stand-alone instance

Thx all~

--
With Regards!

Ye, Qian
Made in Zhejiang University




--- Begin Message ---
Try doing a couple things:

1) use the java cli to do a "stat /path" on the node in question
2) try doing a "echo stat |nc localhost 2181"
(where localhost:2181 is the clientport of the zk server)

would be interesting to see this information on the jira. 1) will allow you to verify that the node is actually an ephemeral node from the perspective of the server (and who owns, etc...) and 2) will allow you to verify that the server doesn't still think that the client is connected.

were you using the c/java binding for your client, or something else?

Patrick

Flavio Junqueira wrote:
Hi Qian, It would be useful to have access to the log and to know which version of ZooKeeper you're using. You may want to open a jira and attach the log there.

Thanks,
-Flavio

On Aug 13, 2009, at 7:51 AM, Qian Ye wrote:

Hi all:

My friend encountered a problem when using ZooKeeper. He built a single
server instance of Zookeeper for development, and he created an ephemeral
node several days ago. It seems that when he killed the process which
created the ephemeral node today, he found that the ephemeral node won't
disappear! I haven't got the related log yet, just want ask whether you guys have encountered the same thing? Is there any bug could lead to the result?

thx~

--
With Regards!

Ye, Qian
Made in Zhejiang University


--- End Message ---

Reply via email to