[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829326#action_12829326
 ] 

Hadoop QA commented on ZOOKEEPER-569:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434729/zookeeper-569.patch
  against trunk revision 903483.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/65/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/65/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/65/console

This message is automatically generated.

> Failure of elected leader can lead to never-ending leader election
> --
>
> Key: ZOOKEEPER-569
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 3.3.0
>
> Attachments: zookeeper-569.patch, zookeeper-569.patch, 
> zookeeper-569.patch
>
>
> It is possible for basic LeaderElection to enter a situation where it never 
> terminates. 
> As an example, consider a three node cluster A, B and C.
> 1. In the first round, A votes for A, B votes for B and C votes for C
> 2. Since C > B > A, all nodes resolve to vote for C in the second round as 
> there is no first round winner
> 3. A, B vote for C, but C fails.
> 4. C is not elected because neither A nor B hear from it, and so votes for it 
> are discarded
> 5. A and B never reset their votes, despite not hearing from C, so continue 
> to vote for it ad infinitum. 
> Step 5 is the bug. If A and B reset their votes to themselves in the case 
> where the heard-from vote set is empty, leader election will continue.
> I do not know if this affects running ZK clusters, as it is possible that the 
> out-of-band failure detection protocols may cause leader election to be 
> restarted anyhow, but I've certainly seen this in tests. 
> I have a trivial patch which fixes it, but it needs a test (and tests for 
> race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-03 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-569:
-

Status: Open  (was: Patch Available)

> Failure of elected leader can lead to never-ending leader election
> --
>
> Key: ZOOKEEPER-569
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 3.3.0
>
> Attachments: zookeeper-569.patch, zookeeper-569.patch, 
> zookeeper-569.patch
>
>
> It is possible for basic LeaderElection to enter a situation where it never 
> terminates. 
> As an example, consider a three node cluster A, B and C.
> 1. In the first round, A votes for A, B votes for B and C votes for C
> 2. Since C > B > A, all nodes resolve to vote for C in the second round as 
> there is no first round winner
> 3. A, B vote for C, but C fails.
> 4. C is not elected because neither A nor B hear from it, and so votes for it 
> are discarded
> 5. A and B never reset their votes, despite not hearing from C, so continue 
> to vote for it ad infinitum. 
> Step 5 is the bug. If A and B reset their votes to themselves in the case 
> where the heard-from vote set is empty, leader election will continue.
> I do not know if this affects running ZK clusters, as it is possible that the 
> out-of-band failure detection protocols may cause leader election to be 
> restarted anyhow, but I've certainly seen this in tests. 
> I have a trivial patch which fixes it, but it needs a test (and tests for 
> race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-03 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-569:
-

Status: Patch Available  (was: Open)

> Failure of elected leader can lead to never-ending leader election
> --
>
> Key: ZOOKEEPER-569
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 3.3.0
>
> Attachments: zookeeper-569.patch, zookeeper-569.patch, 
> zookeeper-569.patch
>
>
> It is possible for basic LeaderElection to enter a situation where it never 
> terminates. 
> As an example, consider a three node cluster A, B and C.
> 1. In the first round, A votes for A, B votes for B and C votes for C
> 2. Since C > B > A, all nodes resolve to vote for C in the second round as 
> there is no first round winner
> 3. A, B vote for C, but C fails.
> 4. C is not elected because neither A nor B hear from it, and so votes for it 
> are discarded
> 5. A and B never reset their votes, despite not hearing from C, so continue 
> to vote for it ad infinitum. 
> Step 5 is the bug. If A and B reset their votes to themselves in the case 
> where the heard-from vote set is empty, leader election will continue.
> I do not know if this affects running ZK clusters, as it is possible that the 
> out-of-band failure detection protocols may cause leader election to be 
> restarted anyhow, but I've certainly seen this in tests. 
> I have a trivial patch which fixes it, but it needs a test (and tests for 
> race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-03 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-569:
-

Attachment: zookeeper-569.patch

Here's a patch with tests that appears to fix the issue (test fails without 
fix, test succeeds with). All tests pass for me with this patch on my laptop. 

I have replaced one kludge with another here. QuorumPeer.electionAlg is set to 
null when electionType==0 until the election is actually run. This causes 
problems if you want to retrieve the electionAlg object via getElectionAlg() 
beforehand for tests. 

I've set it up so that makeLEStrategy always creates a new LeaderElection if 
electionType == 0, but also that createElectionAlgorithm sets electionAlg=new 
LeaderElection(this) instead of null, so that as long as startLeaderElection 
has been called, getElectionAlg() won't return null.

I've checked to see if this will cause any obvious problems for the call sites 
of getElectionAlg and couldn't find anything that expected null. It seems more 
consistent to me this way. The question I have is over why LeaderElection needs 
re-instantiating each time when FLE does not.

If this sounds confusing, it's because the code really is! The interaction of 
createElectionAlgorithm, startLeaderElection and makeLEStrategy is hard to 
discern. 

> Failure of elected leader can lead to never-ending leader election
> --
>
> Key: ZOOKEEPER-569
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 3.3.0
>
> Attachments: zookeeper-569.patch, zookeeper-569.patch, 
> zookeeper-569.patch
>
>
> It is possible for basic LeaderElection to enter a situation where it never 
> terminates. 
> As an example, consider a three node cluster A, B and C.
> 1. In the first round, A votes for A, B votes for B and C votes for C
> 2. Since C > B > A, all nodes resolve to vote for C in the second round as 
> there is no first round winner
> 3. A, B vote for C, but C fails.
> 4. C is not elected because neither A nor B hear from it, and so votes for it 
> are discarded
> 5. A and B never reset their votes, despite not hearing from C, so continue 
> to vote for it ad infinitum. 
> Step 5 is the bug. If A and B reset their votes to themselves in the case 
> where the heard-from vote set is empty, leader election will continue.
> I do not know if this affects running ZK clusters, as it is possible that the 
> out-of-band failure detection protocols may cause leader election to be 
> restarted anyhow, but I've certainly seen this in tests. 
> I have a trivial patch which fixes it, but it needs a test (and tests for 
> race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-321) optmize session tracking in zookeeper.

2010-02-03 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829303#action_12829303
 ] 

Mahadev konar commented on ZOOKEEPER-321:
-

Ben and myself were discussing about session tracking and here is a dump of 
that discussion -

-  we would like to optimize session tracking for the most common zookeeper 
client.  The most common zookeeper clients is  - a read client that can set 
watches
- for a client with watches we would need pings as a part of the client 
library. This is becasue we need to keep track of server being live or being up 
to date. So the client to server traffic would not change at all!
- the traffic from followers to leader will reduce by some (not drastic) amount 
since we would be tracking less sessions.

Given the above findings, the amount of work required to do this doesnt really 
qualify for the amount of savings we will be making on the network traffic 
reduction. I would suggest to close this jira as WONT FIX unless someone else 
thinks otherwise or plans to work on it... 

> optmize session tracking in zookeeper.
> --
>
> Key: ZOOKEEPER-321
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-321
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: c client, java client, server
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 3.3.0
>
>
> sometimes a lot of zookeeper clients are read only. For such clients we do 
> not need the session tracking in zookeeper. Getting rid of session tracking 
> for such clients will help us sclae much better.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-617) improve cluster setup documentation in forrest

2010-02-03 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-617:
--

Assignee: Patrick Hunt

> improve cluster setup documentation in forrest
> --
>
> Key: ZOOKEEPER-617
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-617
> Project: Zookeeper
>  Issue Type: Bug
>  Components: documentation
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
> Fix For: 3.3.0
>
>
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_zkMulitServerSetup
> 1) the config file is missing line returns
> 2) call out setting up the myid file as it's own bullet, otw it's too easy to 
> miss
> 3) we should make sure the values we use in examples are consistent, and 
> resonable defaults

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-595) A means of asking quorum what conifguration it is running with

2010-02-03 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-595:
--

Assignee: Patrick Hunt

> A means of asking quorum what conifguration it is running with
> --
>
> Key: ZOOKEEPER-595
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-595
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: jmx, server
>Reporter: stack
>Assignee: Patrick Hunt
> Fix For: 3.3.0
>
>
> I'd like to ask a running quorum what its configuration is.  I'd want to know 
> stuff like session timeout and tick times.
> Use case is that in hbase there is no zoo.cfg usually; the configuration is 
> manufactured and piped to the starting zk server.  I want to know if all of 
> the manufactured config. 'took' or how zk interpreted it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-524) DBSizeTest is not really testing anything

2010-02-03 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829283#action_12829283
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-524:
--

I must agree that the test awkward. Now, it takes 2 seconds on my laptop, so I 
don't understand why it takes 40 seconds on your computer, Pat.

I think the idea of the test is not bad, but the implementation could be 
improved. For example, we could check if the latency if at most 10% higher 
after populating (10% is an arbitrary number, since it will be impossible to 
get exact values).

If you guys feel strongly about removing it, I don't object and +1 it.

> DBSizeTest is not really testing anything
> -
>
> Key: ZOOKEEPER-524
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-524
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server, tests
>Reporter: Patrick Hunt
>Assignee: Benjamin Reed
>Priority: Minor
> Fix For: 3.3.0
>
>
> DBSizeTest looks like it should be testing latency, but it doesn't seem to do 
> it (assert is commented out).
> We need to decide if this test should be fixed, or just dropped.
> Also note: this test takes 40seconds on my system. Way too long. Perhaps 
> async create operations should be used
> to populate the database. I also noticed that data size has a big impact on 
> overall test time (1k vs 5 bytes is something
> like a 2x time diff for time to run the test).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-589) When create a znode, a NULL ACL parameter cannot be accepted

2010-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829277#action_12829277
 ] 

Hadoop QA commented on ZOOKEEPER-589:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434699/ZOOKEEPER-524.patch
  against trunk revision 903483.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/64/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/64/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/64/console

This message is automatically generated.

> When create a znode, a NULL ACL parameter cannot be accepted
> 
>
> Key: ZOOKEEPER-589
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-589
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.2.1
> Environment: Linux db-passport-test05.vm 2.6.9_5-4-0-5 #1 SMP Tue Apr 
> 14 15:56:24 CST 2009 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Qian Ye
>Assignee: Benjamin Reed
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-524.patch
>
>
> In the comments of client C API which associated with creating znode, eg. 
> zoo_acreate, it is said that the initial ACL of the node "if null, the ACL of 
> the parent will be used". However, the it doesn't work. When execute this 
> kind of request at the server side, it raises InvalidACLException. The source 
> code show that, the function fixupACL return false when it get a null ACL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-524) DBSizeTest is not really testing anything

2010-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829267#action_12829267
 ] 

Hadoop QA commented on ZOOKEEPER-524:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org
  against trunk revision 903483.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/63/console

This message is automatically generated.

> DBSizeTest is not really testing anything
> -
>
> Key: ZOOKEEPER-524
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-524
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server, tests
>Reporter: Patrick Hunt
>Assignee: Benjamin Reed
>Priority: Minor
> Fix For: 3.3.0
>
>
> DBSizeTest looks like it should be testing latency, but it doesn't seem to do 
> it (assert is commented out).
> We need to decide if this test should be fixed, or just dropped.
> Also note: this test takes 40seconds on my system. Way too long. Perhaps 
> async create operations should be used
> to populate the database. I also noticed that data size has a big impact on 
> overall test time (1k vs 5 bytes is something
> like a 2x time diff for time to run the test).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-03 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829220#action_12829220
 ] 

Henry Robinson commented on ZOOKEEPER-569:
--

Yes, you're both right! I misread my own notes on the bug :/

I'm writing tests for a *real* fix now. Thanks both for pointing this out. 

> Failure of elected leader can lead to never-ending leader election
> --
>
> Key: ZOOKEEPER-569
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 3.3.0
>
> Attachments: zookeeper-569.patch, zookeeper-569.patch
>
>
> It is possible for basic LeaderElection to enter a situation where it never 
> terminates. 
> As an example, consider a three node cluster A, B and C.
> 1. In the first round, A votes for A, B votes for B and C votes for C
> 2. Since C > B > A, all nodes resolve to vote for C in the second round as 
> there is no first round winner
> 3. A, B vote for C, but C fails.
> 4. C is not elected because neither A nor B hear from it, and so votes for it 
> are discarded
> 5. A and B never reset their votes, despite not hearing from C, so continue 
> to vote for it ad infinitum. 
> Step 5 is the bug. If A and B reset their votes to themselves in the case 
> where the heard-from vote set is empty, leader election will continue.
> I do not know if this affects running ZK clusters, as it is possible that the 
> out-of-band failure detection protocols may cause leader election to be 
> restarted anyhow, but I've certainly seen this in tests. 
> I have a trivial patch which fixes it, but it needs a test (and tests for 
> race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-03 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829217#action_12829217
 ] 

Benjamin Reed commented on ZOOKEEPER-569:
-

i'm also wondering about the heardFrom == 0. in your case A and B will still be 
up, so heardFrom will not be zero. don't you really want to check whether or 
not you heard from guy that you think is the leader?

> Failure of elected leader can lead to never-ending leader election
> --
>
> Key: ZOOKEEPER-569
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 3.3.0
>
> Attachments: zookeeper-569.patch, zookeeper-569.patch
>
>
> It is possible for basic LeaderElection to enter a situation where it never 
> terminates. 
> As an example, consider a three node cluster A, B and C.
> 1. In the first round, A votes for A, B votes for B and C votes for C
> 2. Since C > B > A, all nodes resolve to vote for C in the second round as 
> there is no first round winner
> 3. A, B vote for C, but C fails.
> 4. C is not elected because neither A nor B hear from it, and so votes for it 
> are discarded
> 5. A and B never reset their votes, despite not hearing from C, so continue 
> to vote for it ad infinitum. 
> Step 5 is the bug. If A and B reset their votes to themselves in the case 
> where the heard-from vote set is empty, leader election will continue.
> I do not know if this affects running ZK clusters, as it is possible that the 
> out-of-band failure detection protocols may cause leader election to be 
> restarted anyhow, but I've certainly seen this in tests. 
> I have a trivial patch which fixes it, but it needs a test (and tests for 
> race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-589) When create a znode, a NULL ACL parameter cannot be accepted

2010-02-03 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-589:


Attachment: ZOOKEEPER-524.patch

simple documentation patch. no tests needed.

> When create a znode, a NULL ACL parameter cannot be accepted
> 
>
> Key: ZOOKEEPER-589
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-589
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.2.1
> Environment: Linux db-passport-test05.vm 2.6.9_5-4-0-5 #1 SMP Tue Apr 
> 14 15:56:24 CST 2009 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Qian Ye
>Assignee: Benjamin Reed
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-524.patch
>
>
> In the comments of client C API which associated with creating znode, eg. 
> zoo_acreate, it is said that the initial ACL of the node "if null, the ACL of 
> the parent will be used". However, the it doesn't work. When execute this 
> kind of request at the server side, it raises InvalidACLException. The source 
> code show that, the function fixupACL return false when it get a null ACL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-589) When create a znode, a NULL ACL parameter cannot be accepted

2010-02-03 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-589:


Status: Patch Available  (was: Open)

> When create a znode, a NULL ACL parameter cannot be accepted
> 
>
> Key: ZOOKEEPER-589
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-589
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.2.1
> Environment: Linux db-passport-test05.vm 2.6.9_5-4-0-5 #1 SMP Tue Apr 
> 14 15:56:24 CST 2009 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Qian Ye
>Assignee: Benjamin Reed
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-524.patch
>
>
> In the comments of client C API which associated with creating znode, eg. 
> zoo_acreate, it is said that the initial ACL of the node "if null, the ACL of 
> the parent will be used". However, the it doesn't work. When execute this 
> kind of request at the server side, it raises InvalidACLException. The source 
> code show that, the function fixupACL return false when it get a null ACL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-622) Test for pending watches in send_set_watches should be moved

2010-02-03 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829180#action_12829180
 ] 

Benjamin Reed commented on ZOOKEEPER-622:
-

sorry steven, i didn't notice that you had commented. yes, please finish the 
test you can make simplifying assumptions such as /tmp and i can help you clean 
it up once things are working. thanx!

> Test for pending watches in send_set_watches should be moved
> 
>
> Key: ZOOKEEPER-622
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-622
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client
>Reporter: Steven Cheng
>Assignee: Benjamin Reed
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-622.patch, ZOOKEEPER-622.patch
>
>
> Valgrind found:
> {quote}
> ==2357== Conditional jump or move depends on uninitialised value(s)
> ==2357==at 0x807FDCA: check_events (zookeeper.c:1180)
> ==2357==by 0x808043A: zookeeper_process (zookeeper.c:1775)
> ==2357==by 0x806A21B: Zookeeper_close::testCloseConnected1() 
> (TestZookeeperClose.cc:161)
> ==2357==by 0x806C6BF: CppUnit::TestCaller::runTest() 
> (TestCaller.h:166)
> {quote}
> zookeeper.c:1180 was the first if in send_set_watches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-524) DBSizeTest is not really testing anything

2010-02-03 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-524:


Status: Patch Available  (was: Open)

there is no patch associated with this since i'm removing a file. i just wanted 
to make sure people saw this issue.

> DBSizeTest is not really testing anything
> -
>
> Key: ZOOKEEPER-524
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-524
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server, tests
>Reporter: Patrick Hunt
>Assignee: Benjamin Reed
>Priority: Minor
> Fix For: 3.3.0
>
>
> DBSizeTest looks like it should be testing latency, but it doesn't seem to do 
> it (assert is commented out).
> We need to decide if this test should be fixed, or just dropped.
> Also note: this test takes 40seconds on my system. Way too long. Perhaps 
> async create operations should be used
> to populate the database. I also noticed that data size has a big impact on 
> overall test time (1k vs 5 bytes is something
> like a 2x time diff for time to run the test).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-662) Too many CLOSE_WAIT socket state on a server

2010-02-03 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829171#action_12829171
 ] 

Patrick Hunt commented on ZOOKEEPER-662:


According to this page http://www.freesoft.org/CIE/Course/Section4/11.htm 
closing the conn should be fine (which is expected). The weird thing
though is that I've definitely seen this issue were stat returns some of the 
data, but the results seem to be truncated. I wonder if it's more an issue
with nc then... thoughts?

> Too many CLOSE_WAIT socket state on a server
> 
>
> Key: ZOOKEEPER-662
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.2.1
> Environment: Linux 2.6.9
>Reporter: Qian Ye
> Fix For: 3.3.0
>
> Attachments: zookeeper.log.2010020105, zookeeper.log.2010020106
>
>
> I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is 
> the content in the configure file, zoo.cfg
> ==
> # The number of milliseconds of each tick
> tickTime=2000
> # The number of ticks that the initial 
> # synchronization phase can take
> initLimit=5
> # The number of ticks that can pass between 
> # sending a request and getting an acknowledgement
> syncLimit=2
> # the directory where the snapshot is stored.
> dataDir=./data/
> # the port at which the clients will connect
> clientPort=8181
> # zookeeper cluster list
> server.100=10.23.253.43:8887:
> server.101=10.23.150.29:8887:
> server.102=10.23.247.141:8887:
> server.200=10.65.20.68:8887:
> server.201=10.65.27.21:8887:
> =
> Before the problem happened, the server.200 was the leader. Yesterday 
> morning, I found the there were many sockets with the state of CLOSE_WAIT on 
> the clientPort (8181),  the total was over about 120. Because of these 
> CLOSE_WAIT, the server.200 could not accept more connections from the 
> clients. The only thing I can do under this situation is restart the 
> server.200, at about 2010-02-01 06:06:35. The related log is attached to the 
> issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-662) Too many CLOSE_WAIT socket state on a server

2010-02-03 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829164#action_12829164
 ] 

Patrick Hunt commented on ZOOKEEPER-662:


Please look into why your c client 10.81.14.8  is failing, this might be useful 
information for me wrt reproducing the issue you are seeing. Can you provide 
some detail on the environment the c client is running? (os, 32vs64, etc...) 
and what the issue is that it is seeing? What is that client attempting to do? 
etc

> Too many CLOSE_WAIT socket state on a server
> 
>
> Key: ZOOKEEPER-662
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.2.1
> Environment: Linux 2.6.9
>Reporter: Qian Ye
> Fix For: 3.3.0
>
> Attachments: zookeeper.log.2010020105, zookeeper.log.2010020106
>
>
> I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is 
> the content in the configure file, zoo.cfg
> ==
> # The number of milliseconds of each tick
> tickTime=2000
> # The number of ticks that the initial 
> # synchronization phase can take
> initLimit=5
> # The number of ticks that can pass between 
> # sending a request and getting an acknowledgement
> syncLimit=2
> # the directory where the snapshot is stored.
> dataDir=./data/
> # the port at which the clients will connect
> clientPort=8181
> # zookeeper cluster list
> server.100=10.23.253.43:8887:
> server.101=10.23.150.29:8887:
> server.102=10.23.247.141:8887:
> server.200=10.65.20.68:8887:
> server.201=10.65.27.21:8887:
> =
> Before the problem happened, the server.200 was the leader. Yesterday 
> morning, I found the there were many sockets with the state of CLOSE_WAIT on 
> the clientPort (8181),  the total was over about 120. Because of these 
> CLOSE_WAIT, the server.200 could not accept more connections from the 
> clients. The only thing I can do under this situation is restart the 
> server.200, at about 2010-02-01 06:06:35. The related log is attached to the 
> issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-662) Too many CLOSE_WAIT socket state on a server

2010-02-03 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829162#action_12829162
 ] 

Patrick Hunt commented on ZOOKEEPER-662:


Qian, if you look at the logs you can see both of these clients, the client I 
mentioned in earlier comment, also the "stat" client:

2010-02-01 06:24:49,783 - INFO  [NIOServerCxn.Factory:8181:nioserverc...@698] - 
Processing stat command from /10.65.7.48:48413
2010-02-01 06:24:49,783 - WARN  [NIOServerCxn.Factory:8181:nioserverc...@494] - 
Exception causing close of session 0x0 due to java.io.IOException: Responded to 
info probe

(really the second line should not be a warn, this is improved in 3.3.0 
codebase).

>From the logs I don't see anything to indicate a problem though. I'm wondering 
>if there is some timing problem in either our c or java networking code (also 
>you are using linux 2.6.9 which is older kernel, I'm wondering if perhaps the 
>timing our app sees is different).

One thing about the 4 letter words (like stat). In some cases I've seen the 
response from the 4letter word be truncated. Perhaps this caused your 
monitoring app to fail? You might add some diags to your monitor app to debug 
this sort of thing.

What I mean is, you request a "stat" and the client sees some of the response, 
but not all of the response. I'm not sure why this is, but
it may have something to do with either the way nc works (I always use nc for 
this) or the way the server works - in the sense that
the server pushes the response text onto the wire and then closes the 
connection. Perhaps in some cases the socket close causes the client
to not see all the response? Is that possible in tcp close?


> Too many CLOSE_WAIT socket state on a server
> 
>
> Key: ZOOKEEPER-662
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.2.1
> Environment: Linux 2.6.9
>Reporter: Qian Ye
> Fix For: 3.3.0
>
> Attachments: zookeeper.log.2010020105, zookeeper.log.2010020106
>
>
> I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is 
> the content in the configure file, zoo.cfg
> ==
> # The number of milliseconds of each tick
> tickTime=2000
> # The number of ticks that the initial 
> # synchronization phase can take
> initLimit=5
> # The number of ticks that can pass between 
> # sending a request and getting an acknowledgement
> syncLimit=2
> # the directory where the snapshot is stored.
> dataDir=./data/
> # the port at which the clients will connect
> clientPort=8181
> # zookeeper cluster list
> server.100=10.23.253.43:8887:
> server.101=10.23.150.29:8887:
> server.102=10.23.247.141:8887:
> server.200=10.65.20.68:8887:
> server.201=10.65.27.21:8887:
> =
> Before the problem happened, the server.200 was the leader. Yesterday 
> morning, I found the there were many sockets with the state of CLOSE_WAIT on 
> the clientPort (8181),  the total was over about 120. Because of these 
> CLOSE_WAIT, the server.200 could not accept more connections from the 
> clients. The only thing I can do under this situation is restart the 
> server.200, at about 2010-02-01 06:06:35. The related log is attached to the 
> issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: ZOOKEEPER-22 and release 3.3

2010-02-03 Thread Henry Robinson
22 will be great when it goes in, but is big enough that it probably
shouldn't be rushed. +1 to waiting for 3.4 and making sure it's done right.

Looking forward to it!
Henry


On 3 February 2010 09:50, Patrick Hunt  wrote:

> While this is a very useful improvement it sounds to me like the prudent
> thing to do given the short time to 3.3.0. If you want we can shoot for
> 3.4.0 soon after 3.3.0 goes out (with 22 as one of the primary features).
>
> Patrick
>
>
> Mahadev Konar wrote:
>
>> Hi all,
>>
>>  I had been working on zookeeper-22 and found out that it needs quite a
>> few
>> extensive changes. We will need to do some memory measurements to see if
>> it
>> has any memory impacts or not.
>>
>> Since we are targetting 3.3 release for early march, ZOOKEEPER-22 would be
>> hard to get into 3.3. I am proposing to move it to a later release (3.4),
>> so
>> that it can be tested early in the release phase and gets baked in the
>> release.
>>
>> Thanks
>> mahadev
>>
>>


-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


Re: ZOOKEEPER-22 and release 3.3

2010-02-03 Thread Patrick Hunt
While this is a very useful improvement it sounds to me like the prudent 
thing to do given the short time to 3.3.0. If you want we can shoot for 
3.4.0 soon after 3.3.0 goes out (with 22 as one of the primary features).


Patrick

Mahadev Konar wrote:

Hi all,

 I had been working on zookeeper-22 and found out that it needs quite a few
extensive changes. We will need to do some memory measurements to see if it
has any memory impacts or not.

Since we are targetting 3.3 release for early march, ZOOKEEPER-22 would be
hard to get into 3.3. I am proposing to move it to a later release (3.4), so
that it can be tested early in the release phase and gets baked in the
release. 



Thanks
mahadev



[jira] Updated: (ZOOKEEPER-607) improve bookkeeper overview

2010-02-03 Thread Flavio Paiva Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Paiva Junqueira updated ZOOKEEPER-607:
-

Attachment: ZOOKEEPER-607.patch

A preliminary patch that compiles (at least for me).

> improve bookkeeper overview
> ---
>
> Key: ZOOKEEPER-607
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-607
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bookkeeper
>Reporter: Benjamin Reed
> Attachments: ZOOKEEPER-607.patch, ZOOKEEPER-607.patch
>
>
> fix the overview section in the bookkeeper documentation to introduce the 
> programmer/admin to bookkeeper before giving the details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: ZooKeeper-trunk #687

2010-02-03 Thread Apache Hudson Server
See