[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731380#action_12731380
 ] 

Henry Robinson commented on ZOOKEEPER-460:
------------------------------------------

I need a little help getting to the bottom of this (I might be misreading 
Hudson's logs).

The code in question is, I think, 'ok' (although a bit dodgy). The idea is to 
test the ability of a client - that is waiting because the max cnxns limit has 
been reached - to reconnect once a slot becomes free on the server. So ideally 
for this test close(1) should happen after createclient(2) has connected. As 
you say, this is a false assumption as the close might happen before the 
createClient(2) succeeds so there is no contention, but this should only be 
giving false positives - the second assert should eventually succeed. What I 
need to do to improve this is to replace createClient with a call that blocks 
until we at least know the connection attempt has been made, if that's possible.

However the most recent Hudson failures don't seem to be related. From build 
375:

[exec] Zookeeper_simpleSystem::testAsyncWatcherAutoReset : assertion
     [exec] Zookeeper_watchers::testDefaultSessionWatcher1 : OK
     [exec] Zookeeper_watchers::testDefaultSessionWatcher2 : OK
     [exec] Zookeeper_watchers::testObjectSessionWatcher1 : OK
     [exec] Zookeeper_watchers::testObjectSessionWatcher2 : OK
     [exec] Zookeeper_watchers::testNodeWatcher1 : OK
     [exec] Zookeeper_watchers::testChildWatcher1 : OK
     [exec] Zookeeper_watchers::testChildWatcher2 : OK
     [exec] 
     [exec] 
/home/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/src/c/tests/TestClient.cc:289:
 Assertion: equality assertion failed [Expected: -101, Actual  : -4]
     [exec] Failures !!!
     [exec] Run: 32   Failure total: 1   Failures: 1   Errors: 0
     [exec] make: *** [run-check] Error 1

and the same from 376 (yesterday's build). These are failing in TestClient 
(specifically testAsyncWatcherAutoReset). The error here is that a stat 
completion callback is getting called with ZCONNECTIONLOSS, but is expecting to 
see ZNONODE, and the assert is failing.

This test runs fine for me locally, so is the problem a heavily loaded Hudson, 
causing the connection loss?

Similarly the failed build you point to, 371, fails TestClientRetry with a 
broken pipe error which to my novice eye sounds a bit like something falling 
over under load.

It looks to me right now like the TestClientRetry code needs improving, but is 
benign as it should only cause false positives, and we need to understand the 
reasons why TestClient is failing. Does that sound right?

> bad testRetry in cppunit tests (hudson failure)
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-460
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-460
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client, tests
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>             Fix For: 3.2.1, 3.3.0
>
>
> the followng code failed on hudson
> http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/371/
>       watchctx_t ctx1, ctx2;
>       zhandle_t *zk1 = createClient(&ctx1);
>       CPPUNIT_ASSERT_EQUAL(true, ctx1.waitForConnected(zk1));
>       zhandle_t *zk2 = createClient(&ctx2);
>       zookeeper_close(zk1);
>       CPPUNIT_ASSERT_EQUAL(true, ctx2.waitForConnected(zk2));
> there's a problem with this test, it assumes that close(1) can be called 
> before createclient(2) gets connected.
> this is not correct: createclient is an async call an in some cases the 
> connection can be established before
> create client returns.
> this shows a failure in this case because client1 was created, then client2 
> attempted to connect
> but failed due to this on the server (max conn exceeded):
>         sprintf(cmd, "export ZKMAXCNXNS=1;%s startClean %s", ZKSERVER_CMD, 
> getHostPorts());
> conn 2 failed and therefore the following assert eventually failed.
> this code should not assume that close(1) will beat connect(2)
> Henry can you take a look?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to