[jira] Updated: (ZOOKEEPER-434) the java shell should indicate connection status on command prompt

2009-06-08 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-434:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed revision 782879.


 the java shell should indicate connection status on command prompt
 --

 Key: ZOOKEEPER-434
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-434
 Project: Zookeeper
  Issue Type: Improvement
  Components: java client
Affects Versions: 3.1.1
Reporter: Patrick Hunt
Assignee: Henry Robinson
Priority: Minor
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-434.patch


 it would be very useful if the java shell showed the current connection 
 status as part of the command prompt.
 this shows itself in particular for the following use case:
 I attempted to connect a java shell to a remote cluster that was unavailable, 
 when I run the first command ls / on
 the cluster the shell hangs. It would be nice if the shell indicated 
 connection status in the prompt and make it more
 clear that the shell is currently not connected. (it was hard to see the 
 attempting to connect console message as
 it was lost in with the other messaes...)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (ZOOKEEPER-434) the java shell should indicate connection status on command prompt

2009-06-08 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717559#action_12717559
 ] 

Benjamin Reed edited comment on ZOOKEEPER-434 at 6/8/09 10:12 PM:
--

Committed revision 782880.

  was (Author: breed):
Committed revision 782879.

  
 the java shell should indicate connection status on command prompt
 --

 Key: ZOOKEEPER-434
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-434
 Project: Zookeeper
  Issue Type: Improvement
  Components: java client
Affects Versions: 3.1.1
Reporter: Patrick Hunt
Assignee: Henry Robinson
Priority: Minor
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-434.patch


 it would be very useful if the java shell showed the current connection 
 status as part of the command prompt.
 this shows itself in particular for the following use case:
 I attempted to connect a java shell to a remote cluster that was unavailable, 
 when I run the first command ls / on
 the cluster the shell hangs. It would be nice if the shell indicated 
 connection status in the prompt and make it more
 clear that the shell is currently not connected. (it was hard to see the 
 attempting to connect console message as
 it was lost in with the other messaes...)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-435) allow super admin digest based auth to be configurable

2009-06-08 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-435:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed revision 782882.

 allow super admin digest based auth to be configurable
 

 Key: ZOOKEEPER-435
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-435
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-435.patch


 the server has a super digest based auth user that enables administrative 
 access (ie has access to znodes regardless
 of acl settings) but the password is not configurable
 1) make the default digest null, ie turn off super by default
 2) if a command line option is specified when starting server then use the 
 provided digest for super
 eg. java -Dzookeeper.DigestAuthenticationProvider.superDigest=xkxkxkxkx 
 also this is not documented in the forrest docs - need to add that along with 
 tests as part of the patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-336) single bad client can cause server to stop accepting connections

2009-06-08 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717567#action_12717567
 ] 

Benjamin Reed commented on ZOOKEEPER-336:
-

i forgot to add the new files. new files added in revision 782883.

 single bad client can cause server to stop accepting connections
 

 Key: ZOOKEEPER-336
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-336
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client, java client, server
Reporter: Patrick Hunt
Assignee: Henry Robinson
Priority: Critical
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-336.patch, ZOOKEEPER-336.patch, 
 ZOOKEEPER-336.patch, ZOOKEEPER-336.patch, ZOOKEEPER-336.patch, 
 ZOOKEEPER-336.patch


 One user saw a case where a single mis-programmed client was overloading the 
 server with connections - the client was creating a huge number of sessions 
 to the server. This caused all of the fds on the  server to become used.
 Seems like we should have some way of limiting (configurable override) the 
 maximum number of sessions from a single client (say 10 by default?) Also we 
 should output warnings when this limit is exceeded (or attempt to exceed).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-441) Zk-336 diff got applied twice to TestClientRetry.cc C test, causing compilation failure

2009-06-09 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717734#action_12717734
 ] 

Benjamin Reed commented on ZOOKEEPER-441:
-

thanx henry! i had a rough night last night. i missed this one. testing now.

 Zk-336 diff got applied twice to TestClientRetry.cc C test, causing 
 compilation failure
 ---

 Key: ZOOKEEPER-441
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-441
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
Priority: Blocker
 Attachments: ZOOKEEPER-441.patch


 The latest version of trunk has a src/c/tests/TestClientRetry.cc file that 
 has the actual file from ZK-336 appended to itself. This causes the 
 compilation to fail due to lots of redeclaration errors. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-356) Masking bookie failure during writes to a ledger

2009-06-09 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-356:


Status: Open  (was: Patch Available)

-1 wow you did a lot of work flavio. big patch, so i found a couple of 
problems. (some i might just be confused about.) it shouldn't be much to fix:

* in LedgerOutputStream why are you interrupting the thread on BKExceptions
* in the tests, why are you catching and just logging BKExceptions? shouldn't 
those make the tests fail?
* i think _down_ should be volatile in BookieServer
* why do you pass a BookieHandle to BookieClient
* in BookKeeper you should probably catch NumberFormatException when you call 
Long.parseLong its one of those things are are really hard to debug if it 
happens
* could you add a comment to the top of BookKeeper to explain how the different 
znodes are used? it will really help the next person
* i think _stop_ and _incoming_ should be updated and read in the same 
synchronized block right?
* in LedgerManager @return says getItem returns a long rather than String
* are next and errorCounter used in ClientCB?

very nice job on using the state machine to process the asynchronous calls!


 Masking bookie failure during writes to a ledger
 

 Key: ZOOKEEPER-356
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-356
 Project: Zookeeper
  Issue Type: New Feature
  Components: contrib-bookkeeper
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, 
 ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, 
 ZOOKEEPER-BOOKKEEPER-356.patch


 The idea of this jira is to work out the changes necessary to make a client 
 mask the failure of a bookie while writing to a ledger. I'm submitting a 
 preliminary patch, but before I submit a final one, I need to have 288 
 committed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-442) need a way to remove watches that are no longer of interest

2009-06-11 Thread Benjamin Reed (JIRA)
need a way to remove watches that are no longer of interest
---

 Key: ZOOKEEPER-442
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-442
 Project: Zookeeper
  Issue Type: Improvement
Reporter: Benjamin Reed


currently the only way a watch cleared is to trigger it. we need a way to 
enumerate the outstanding watch objects, find watch events the objects are 
watching for, and remove interests in an event.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-442) need a way to remove watches that are no longer of interest

2009-06-11 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718467#action_12718467
 ] 

Benjamin Reed commented on ZOOKEEPER-442:
-

there are two problematic scenarios: 1) an application that has many transient 
interests can register a bunch of watches which wastes memory to monitor the 
watches (granted it is a very small amount of memory) and it cases unnecessary 
processing when those watches are triggered 2) applications need to be prepared 
to ignore watch events that they are no longer interested in.

 need a way to remove watches that are no longer of interest
 ---

 Key: ZOOKEEPER-442
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-442
 Project: Zookeeper
  Issue Type: Improvement
Reporter: Benjamin Reed

 currently the only way a watch cleared is to trigger it. we need a way to 
 enumerate the outstanding watch objects, find watch events the objects are 
 watching for, and remove interests in an event.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (ZOOKEEPER-443) trace logging in watch notification not wrapped with istraceneabled - inefficient

2009-06-12 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719004#action_12719004
 ] 

Benjamin Reed edited comment on ZOOKEEPER-443 at 6/12/09 3:22 PM:
--

this looks good, but there something weird. ZooTrace also has an isTraceEnabled 
message. should we be using that? also 

{noformat}
-ZooTrace.logRequest(LOG, traceMask, 'P', request, );
+if (LOG.isTraceEnabled()) {
+ZooTrace.logRequest(LOG, traceMask, 'P', request, );
+}
{noformat}

doesn't really need the if does it? nothing is saved and the first thing 
ZooTrace.logRequest is going to do is call ZooTrace.isTraceEnabled.

  was (Author: breed):
this looks good, but there something weird. ZooTrace also has an 
isTraceEnabled message. should we be using that? also 

{quote}
-ZooTrace.logRequest(LOG, traceMask, 'P', request, );
+if (LOG.isTraceEnabled()) {
+ZooTrace.logRequest(LOG, traceMask, 'P', request, );
+}
{quote}

doesn't really need the if does it? nothing is saved and the first thing 
ZooTrace.logRequest is going to do is call ZooTrace.isTraceEnabled.
  
 trace logging in watch notification not wrapped with istraceneabled - 
 inefficient
 -

 Key: ZOOKEEPER-443
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-443
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-443.patch


 In org.apache.zookeeper.server.NIOServerCnxn.process(WatchedEvent) there's a 
 trace message
 that's not wrapped with isTraceEnabled, this is very inefficient and should 
 be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-443) trace logging in watch notification not wrapped with istraceneabled - inefficient

2009-06-12 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719004#action_12719004
 ] 

Benjamin Reed commented on ZOOKEEPER-443:
-

this looks good, but there something weird. ZooTrace also has an isTraceEnabled 
message. should we be using that? also 

{quote}
-ZooTrace.logRequest(LOG, traceMask, 'P', request, );
+if (LOG.isTraceEnabled()) {
+ZooTrace.logRequest(LOG, traceMask, 'P', request, );
+}
{quote}

doesn't really need the if does it? nothing is saved and the first thing 
ZooTrace.logRequest is going to do is call ZooTrace.isTraceEnabled.

 trace logging in watch notification not wrapped with istraceneabled - 
 inefficient
 -

 Key: ZOOKEEPER-443
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-443
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-443.patch


 In org.apache.zookeeper.server.NIOServerCnxn.process(WatchedEvent) there's a 
 trace message
 that's not wrapped with isTraceEnabled, this is very inefficient and should 
 be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (ZOOKEEPER-443) trace logging in watch notification not wrapped with istraceneabled - inefficient

2009-06-12 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719004#action_12719004
 ] 

Benjamin Reed edited comment on ZOOKEEPER-443 at 6/12/09 3:23 PM:
--

this looks good, but there something weird. ZooTrace also has an isTraceEnabled 
message. should we be using that? also 

{noformat}
-ZooTrace.logRequest(LOG, traceMask, 'P', request, );
+if (LOG.isTraceEnabled()) {
+ZooTrace.logRequest(LOG, traceMask, 'P', request, );
+}
{noformat}

doesn't really need the if does it? nothing is processing is saved since no new 
strings are being built and the first thing ZooTrace.logRequest is going to do 
is call ZooTrace.isTraceEnabled.

  was (Author: breed):
this looks good, but there something weird. ZooTrace also has an 
isTraceEnabled message. should we be using that? also 

{noformat}
-ZooTrace.logRequest(LOG, traceMask, 'P', request, );
+if (LOG.isTraceEnabled()) {
+ZooTrace.logRequest(LOG, traceMask, 'P', request, );
+}
{noformat}

doesn't really need the if does it? nothing is saved and the first thing 
ZooTrace.logRequest is going to do is call ZooTrace.isTraceEnabled.
  
 trace logging in watch notification not wrapped with istraceneabled - 
 inefficient
 -

 Key: ZOOKEEPER-443
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-443
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-443.patch


 In org.apache.zookeeper.server.NIOServerCnxn.process(WatchedEvent) there's a 
 trace message
 that's not wrapped with isTraceEnabled, this is very inefficient and should 
 be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-443) trace logging in watch notification not wrapped with istraceneabled - inefficient

2009-06-12 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-443:


Hadoop Flags: [Reviewed]

+1 agreed

 trace logging in watch notification not wrapped with istraceneabled - 
 inefficient
 -

 Key: ZOOKEEPER-443
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-443
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-443.patch


 In org.apache.zookeeper.server.NIOServerCnxn.process(WatchedEvent) there's a 
 trace message
 that's not wrapped with isTraceEnabled, this is very inefficient and should 
 be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

2009-06-15 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719823#action_12719823
 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-

Raghu, i think henry is correct that you must get an ack from quorums in both 
the old and new views before committing the change. otherwise you get split 
brain which could result in multiple leaders. henry, i think we are thinking 
along the same lines, but i'm a bit skeptical of JOIN and LEAVE. in some sense 
they are a bit of an optimization that can be implemented with GETVIEW and 
NEWVIEW. it would be nice to make the mechanism as simple as possible. it also 
seems like you would also require a GETVIEW to be done before doing a NEWVIEW, 
just for sanity. (require an expected version on NEWVIEW and not allow a -1.) i 
was thinking that we would just push NEWVIEW through Zab making sure we get 
acks from quorums in both the old and new views.

to help mitigate the case where proposing the NEWVIEW leads to a case where the 
system freezes up when the NEWVIEW proposal goes out and there isn't a quorum 
in the new view, the leader should probably make sure that it currently has 
quorum of followers in the new view before proposing the request. if it 
doesn't, it should error out the request. even with this we can still freeze up 
if we lose quorum in the new view after issuing the proposal, but that would 
happen anyway (as you point out), but it would prevent us from doing something 
that has no chance of working.

 Allow dynamic changes to server cluster membership
 --

 Key: ZOOKEEPER-107
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
 Attachments: SimpleAddition.rtf


 Currently cluster membership is statically defined, adding/removing hosts 
 to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

2009-06-16 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720323#action_12720323
 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-

just a caveat to my last comment. for point 1) we actually do need to touch the 
protocol code a bit to ensure that the setData that changes the view commits in 
both the old and new views.

 Allow dynamic changes to server cluster membership
 --

 Key: ZOOKEEPER-107
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
 Attachments: SimpleAddition.rtf


 Currently cluster membership is statically defined, adding/removing hosts 
 to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

2009-06-16 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720380#action_12720380
 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-

so if you do one at a time without using Zab, without working through the 
details
1) start with A, B, C, D
2) A is the leader and proposes LEAVE D and fails where only A and C get it.
3) B is the leader and proposes LEAVE C and fails where only B and D get it 
because of a complete power outage.
4) everything comes back up
5) A is elected leader by C
6) B is elected leader by D

if we use ZAB split brain will not occur because we do not use the 
configuration until it has been committed. since it has been accepted by both 
the old and new quorums, we will eventually converge on the new configuration. 
(that is my conjecture, still needs to be proven)

 Allow dynamic changes to server cluster membership
 --

 Key: ZOOKEEPER-107
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
 Attachments: SimpleAddition.rtf


 Currently cluster membership is statically defined, adding/removing hosts 
 to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-444) perms definition for PERMS_ALL differ in C and java

2009-06-17 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720774#action_12720774
 ] 

Benjamin Reed commented on ZOOKEEPER-444:
-

+1 brilliant!

 perms definition for PERMS_ALL differ in C and java
 ---

 Key: ZOOKEEPER-444
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-444
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Mahadev konar
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-444.patch


 the perms_all definition in Java is PERMS.ALL and does not include ADMIN 
 perms but in c the PERMS_ALL def includes the ADMIN perms. We should make it 
 consistent to include or not include the admin perms in both c and java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

2009-06-17 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720790#action_12720790
 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-

oh right. you are correct. i guess it is more of a liveness/correctness issue:

1) start with A, B, C, D
2) B is down and A is the leader and proposes LEAVE C and fails where only D 
gets it.
3) C and D cannot get quorum since C has an older view.
4) D fails
5) A and B come back up and B is elected leader.
6) B proposes LEAVE A and C gets it before B fails.

Now what happens? we cannot get quorum with just A and C since A has the old 
view. even if D comes up it will not elect C because it does not believe C is 
part of the ensemble. if they all come up either C or D can be elected leader, 
but if C is elected you end up with conflicting views: A thinks (B, C, D), B 
thinks (B, C, D), C thinks (B, C, D), and D thinks (A, B, D), so both A and D 
will effectively be out of the ensemble and you can't tolerate any failures.



 Allow dynamic changes to server cluster membership
 --

 Key: ZOOKEEPER-107
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
 Attachments: SimpleAddition.rtf


 Currently cluster membership is statically defined, adding/removing hosts 
 to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-408) address all findbugs warnings in persistence classes

2009-06-18 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-408:



+1 please commit.

 address all findbugs warnings in persistence classes
 

 Key: ZOOKEEPER-408
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-408
 Project: Zookeeper
  Issue Type: Sub-task
Reporter: Patrick Hunt
Assignee: Mahadev konar
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-408.patch, ZOOKEEPER-408.patch, 
 ZOOKEEPER-408.patch, ZOOKEEPER-408.patch, ZOOKEEPER-408.patch, 
 ZOOKEEPER-408.patch


 trunk/src/java/main/org/apache/zookeeper/server/DataTree.java
 trunk/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
 trunk/src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java
 trunk/src/java/main/org/apache/zookeeper/server/persistence/Util.java
 trunk/src/java/main/org/apache/zookeeper/server/DataNode.java
 trunk/src/java/main/org/apache/zookeeper/server/upgrade/DataNodeV1.java
 trunk/src/java/main/org/apache/zookeeper/server/upgrade/DataTreeV1.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-397) mainline tests conversion

2009-06-18 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-397:


Status: Open  (was: Patch Available)

patch doesn't apply

 mainline tests conversion
 -

 Key: ZOOKEEPER-397
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-397
 Project: Zookeeper
  Issue Type: Sub-task
  Components: tests
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
 Fix For: 3.3.0

 Attachments: testng-5.9-jdk15.jar, ZOOKEEPER-397.patch, 
 ZOOKEEPER-397.patch, ZOOKEEPER-397.patch, ZOOKEEPER-397.patch, 
 ZOOKEEPER-397.patch, ZOOKEEPER-397.patch, ZOOKEEPER-397.patch


 In this stage main set (src/java/test) of ZK tests will be converted to TestNG

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-427) ZooKeeper server unexpectedly high CPU utilisation

2009-06-18 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-427:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed revision 786251.

 ZooKeeper server unexpectedly high CPU utilisation
 --

 Key: ZOOKEEPER-427
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-427
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
 Environment: Linux: 2.6.18-92.1.18.el5 #1 SMP Wed Nov 12 09:19:49 EST 
 2008 x86_64 x86_64 x86_64 GNU/Linux
 java version 1.6.0_03
 Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
 Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_03-b05, mixed mode)
Reporter: Satish Bhatti
Assignee: Flavio Paiva Junqueira
Priority: Blocker
 Fix For: 3.2.0

 Attachments: zk_quorum_recv_eof.patch, zoo.cfg, ZOOKEEPER-427.patch, 
 zookeeper-jstack.log, zookeeper.log


 I am running a 5 node ZooKeeper cluster and I noticed that one of them has 
 very high CPU usage:
  PID   USER  PR  NI  VIRT  RES  SHR S   %CPU %MEMTIME+   COMMAND 
  6883  infact   22   0   725m  41m  4188 S   95   0.5  
 5671:54  java
 It is not doing anything application-wise at this point, so I was wondering 
 why the heck it's using up so much CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-422) Java CLI should support ephemeral and sequential node creation

2009-06-18 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-422:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Fixed the usage string thanx henry!
Committed revision 786317.


 Java CLI should support ephemeral and sequential node creation
 --

 Key: ZOOKEEPER-422
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-422
 Project: Zookeeper
  Issue Type: Improvement
Affects Versions: 3.2.0
Reporter: Henry Robinson
Assignee: Henry Robinson
Priority: Minor
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-422.patch


 The C client supports creation of ephemeral and sequential nodes. For feature 
 parity, so should the Java CLI. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-329) document how to integrate 3rd party authentication into ZK server ACLs

2009-06-18 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-329:


Status: Patch Available  (was: Open)

 document how to integrate 3rd party authentication into ZK server ACLs
 --

 Key: ZOOKEEPER-329
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-329
 Project: Zookeeper
  Issue Type: Improvement
  Components: documentation
Reporter: Patrick Hunt
Assignee: Benjamin Reed
Priority: Minor
 Fix For: 3.2.0

 Attachments: plugauth.pdf, ZOOKEEPER-329.patch


 the docs mention that zk supports pluggable auth schemes but doesn't detail 
 the API/examples. We should add this to the docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-329) document how to integrate 3rd party authentication into ZK server ACLs

2009-06-18 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-329:


Attachment: ZOOKEEPER-329.patch
plugauth.pdf

I'm attaching a pdf of the relevant section to ease review.

 document how to integrate 3rd party authentication into ZK server ACLs
 --

 Key: ZOOKEEPER-329
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-329
 Project: Zookeeper
  Issue Type: Improvement
  Components: documentation
Reporter: Patrick Hunt
Assignee: Benjamin Reed
Priority: Minor
 Fix For: 3.2.0

 Attachments: plugauth.pdf, ZOOKEEPER-329.patch


 the docs mention that zk supports pluggable auth schemes but doesn't detail 
 the API/examples. We should add this to the docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-224) Deploy ZooKeeper 3.2.0 to a Maven Repository

2009-06-18 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-224:


Fix Version/s: 3.2.0
  Summary: Deploy ZooKeeper 3.2.0 to a Maven Repository  (was: Deploy 
ZooKeeper 3.0.0 to a Maven Repository)

In the next release can we get zookeeper.jar zookeeper-test.jar and 
bookeeper.jar published to maven? is there some simple procedure to apply to 
our built jar files to make them deployable?

 Deploy ZooKeeper 3.2.0 to a Maven Repository
 

 Key: ZOOKEEPER-224
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-224
 Project: Zookeeper
  Issue Type: Task
  Components: build
Affects Versions: 3.0.0
Reporter: Hiram Chirino
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.2.0


 I've created the maven poms needed for the 3.0.0 release.  
 The directory structure and artifacts located at:
 http://people.apache.org/~chirino/zk-repo/
 aka
 people.apache.org:/x1/users/chirino/public_html/zk-repo
 Just need sto get GPG signed by the project KEY and deployed to:
 people.apache.org:/www/people.apache.org/repo/m2-ibiblio-rsync-repository
 Who's the current ZooKeeper release manager?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-224) Deploy ZooKeeper 3.2.0 to a Maven Repository

2009-06-19 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722056#action_12722056
 ] 

Benjamin Reed commented on ZOOKEEPER-224:
-

i think it is just a matter of running mvn deploy:deploy-file  with the right 
flags right? i was thinking we would run it right after do do the release.

 Deploy ZooKeeper 3.2.0 to a Maven Repository
 

 Key: ZOOKEEPER-224
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-224
 Project: Zookeeper
  Issue Type: Task
  Components: build
Affects Versions: 3.0.0
Reporter: Hiram Chirino
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.2.0


 I've created the maven poms needed for the 3.0.0 release.  
 The directory structure and artifacts located at:
 http://people.apache.org/~chirino/zk-repo/
 aka
 people.apache.org:/x1/users/chirino/public_html/zk-repo
 Just need sto get GPG signed by the project KEY and deployed to:
 people.apache.org:/www/people.apache.org/repo/m2-ibiblio-rsync-repository
 Who's the current ZooKeeper release manager?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

2009-06-19 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722063#action_12722063
 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-

i think if we use the notion of observers it helps: an observer can sync with a 
leader, but it doesn't get to vote. i think this makes it easy because the 
leader can then determine that it can commit with both the active followers and 
active observers if needed: for example start with A, B, C and move to A, B, D, 
E, F. if A and C are active followers and E and F are observers then the leader 
will propose the new configuration.

 Allow dynamic changes to server cluster membership
 --

 Key: ZOOKEEPER-107
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
Assignee: Henry Robinson
 Attachments: SimpleAddition.rtf


 Currently cluster membership is statically defined, adding/removing hosts 
 to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-446) some traces of the host auth scheme left

2009-06-19 Thread Benjamin Reed (JIRA)
some traces of the host auth scheme left


 Key: ZOOKEEPER-446
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-446
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
 Fix For: 3.2.0


the host auth scheme was removed because it used a blocking call in an async 
pipeline. however, tragically, the blocking call was not removed including a 
couple of other stray classes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-446) some traces of the host auth scheme left

2009-06-19 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-446:


Attachment: ZOOKEEPER-446.patch

 some traces of the host auth scheme left
 

 Key: ZOOKEEPER-446
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-446
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-446.patch


 the host auth scheme was removed because it used a blocking call in an async 
 pipeline. however, tragically, the blocking call was not removed including a 
 couple of other stray classes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-356) Masking bookie failure during writes to a ledger

2009-06-23 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723150#action_12723150
 ] 

Benjamin Reed commented on ZOOKEEPER-356:
-

just a couple of things:

* in BookieHandle, why doesn't stop get set to true on shutdown?
* you need to check all your uses of LOG.info most of them seem to really be 
LOG.debug
* in ClientCBWorker stop should be volatile
* in LedgerHandle shouldn't add/removeBookie be synchronized?
* in QuorumEngine should idCounter be synchronized?
* In BookieClient you do a new IOException(), you should provide some hint of 
the problem in the constructor

 Masking bookie failure during writes to a ledger
 

 Key: ZOOKEEPER-356
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-356
 Project: Zookeeper
  Issue Type: New Feature
  Components: contrib-bookkeeper
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, 
 ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, 
 ZOOKEEPER-356.patch, ZOOKEEPER-BOOKKEEPER-356.patch


 The idea of this jira is to work out the changes necessary to make a client 
 mask the failure of a bookie while writing to a ledger. I'm submitting a 
 preliminary patch, but before I submit a final one, I need to have 288 
 committed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-224) Deploy ZooKeeper 3.2.0 to a Maven Repository

2009-06-23 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723182#action_12723182
 ] 

Benjamin Reed commented on ZOOKEEPER-224:
-

why do we need ivy? can't we just run the command outside the build process 
after we do the release?


 Deploy ZooKeeper 3.2.0 to a Maven Repository
 

 Key: ZOOKEEPER-224
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-224
 Project: Zookeeper
  Issue Type: Task
  Components: build
Affects Versions: 3.0.0
Reporter: Hiram Chirino
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.2.0


 I've created the maven poms needed for the 3.0.0 release.  
 The directory structure and artifacts located at:
 http://people.apache.org/~chirino/zk-repo/
 aka
 people.apache.org:/x1/users/chirino/public_html/zk-repo
 Just need sto get GPG signed by the project KEY and deployed to:
 people.apache.org:/www/people.apache.org/repo/m2-ibiblio-rsync-repository
 Who's the current ZooKeeper release manager?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-224) Deploy ZooKeeper 3.2.0 to a Maven Repository

2009-06-23 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723247#action_12723247
 ] 

Benjamin Reed commented on ZOOKEEPER-224:
-

sorry, i should have scoped my question better. i mean why do we need ivy to 
push our release jars into the repository?. i can see how we can use ivy for 
other needs, but for the specific issue of getting our jars into the maven 
repository, we can just run a command after we do the release. right?


 Deploy ZooKeeper 3.2.0 to a Maven Repository
 

 Key: ZOOKEEPER-224
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-224
 Project: Zookeeper
  Issue Type: Task
  Components: build
Affects Versions: 3.0.0
Reporter: Hiram Chirino
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.2.0


 I've created the maven poms needed for the 3.0.0 release.  
 The directory structure and artifacts located at:
 http://people.apache.org/~chirino/zk-repo/
 aka
 people.apache.org:/x1/users/chirino/public_html/zk-repo
 Just need sto get GPG signed by the project KEY and deployed to:
 people.apache.org:/www/people.apache.org/repo/m2-ibiblio-rsync-repository
 Who's the current ZooKeeper release manager?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers

2009-06-23 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-417:


Attachment: ZOOKEEPER-417.patch

 stray message problem when changing servers
 ---

 Key: ZOOKEEPER-417
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-417.patch


 There is  a possibility for stray messages from a previous connection to 
 violate ordering and generally cause problems. Here is a scenario: we have a 
 client, C, two followers, F1 and F2, and a leader, L. The client is connected 
 to F1, which is a slow follower. C sends setData(/a, 1) to F1 and then 
 loses the connection, so C reconnects to F2 and sends setData(/a, 2).  it 
 is possible, if F1 is slow enough and the setData(/a, 1) got onto the 
 network before the connection break, for F1 to forward the setData(/a, 1) 
 to L after F2 forwards setData(/a, 2).
 to fix this, the leader should keep track of which follower last registered a 
 session for a client and drop any requests from followers for clients for 
 whom they do not have a registration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers

2009-06-23 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-417:


Status: Patch Available  (was: Open)

 stray message problem when changing servers
 ---

 Key: ZOOKEEPER-417
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-417.patch


 There is  a possibility for stray messages from a previous connection to 
 violate ordering and generally cause problems. Here is a scenario: we have a 
 client, C, two followers, F1 and F2, and a leader, L. The client is connected 
 to F1, which is a slow follower. C sends setData(/a, 1) to F1 and then 
 loses the connection, so C reconnects to F2 and sends setData(/a, 2).  it 
 is possible, if F1 is slow enough and the setData(/a, 1) got onto the 
 network before the connection break, for F1 to forward the setData(/a, 1) 
 to L after F2 forwards setData(/a, 2).
 to fix this, the leader should keep track of which follower last registered a 
 session for a client and drop any requests from followers for clients for 
 whom they do not have a registration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-356) Masking bookie failure during writes to a ledger

2009-06-23 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-356:


Issue Type: Improvement  (was: New Feature)

 Masking bookie failure during writes to a ledger
 

 Key: ZOOKEEPER-356
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-356
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bookkeeper
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, 
 ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, 
 ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, 
 ZOOKEEPER-BOOKKEEPER-356.patch


 The idea of this jira is to work out the changes necessary to make a client 
 mask the failure of a bookie while writing to a ledger. I'm submitting a 
 preliminary patch, but before I submit a final one, I need to have 288 
 committed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-356) Masking bookie failure during writes to a ledger

2009-06-23 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-356:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed revision 787907.

 Masking bookie failure during writes to a ledger
 

 Key: ZOOKEEPER-356
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-356
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bookkeeper
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, 
 ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, 
 ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, 
 ZOOKEEPER-BOOKKEEPER-356.patch


 The idea of this jira is to work out the changes necessary to make a client 
 mask the failure of a bookie while writing to a ledger. I'm submitting a 
 preliminary patch, but before I submit a final one, I need to have 288 
 committed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-264) docs should include a state transition diagram for client state

2009-06-24 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-264:


Status: Patch Available  (was: Reopened)

 docs should include a state transition diagram for client state
 ---

 Key: ZOOKEEPER-264
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-264
 Project: Zookeeper
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.0.1, 3.0.0
Reporter: Patrick Hunt
Assignee: Benjamin Reed
Priority: Minor
 Fix For: 3.2.0

 Attachments: state_dia.dia, state_dia.png, ZOOKEEPER-264.patch


 we should have a state transition diagram to help users understand client 
 state transitions. perhaps the edges could indicate what might cause such a  
 transition? (not sure if that will work). keep in mind for the states that 
 the java/c clients have diff names for constants (not sure how to handle). 
 This should be added to the programmer guide in the appropriate section.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (ZOOKEEPER-314) add wiki docs for bookeeper.

2009-06-24 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-314.
-

Resolution: Fixed

done: http://wiki.apache.org/hadoop/BookKeeper

 add wiki docs for bookeeper.
 

 Key: ZOOKEEPER-314
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-314
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bookkeeper
Affects Versions: 3.1.0
Reporter: Mahadev konar
Assignee: Benjamin Reed
 Fix For: 3.2.0


 we should have a wiki page for bookeeper for users to take a cursory look at 
 what it is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-237) Add a Chroot request

2009-06-24 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-237:


Status: Open  (was: Patch Available)

looks good. two comments:

* feel free to ignore this one: when you setup hostname and chroot, i think the 
code is simpler if you hostname = strdup(host) and then poke a null into 
hostname to strip off the chroot
* we need to make sure we have total coverage for the testcases. you are 
missing a couple of the synchronous calls and you need to add the asynchronous 
calls. (i know it is tedious)


 Add a Chroot request
 

 Key: ZOOKEEPER-237
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-237
 Project: Zookeeper
  Issue Type: New Feature
  Components: c client, java client
Reporter: Benjamin Reed
Assignee: Mahadev konar
Priority: Minor
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-237.patch, ZOOKEEPER-237.patch, 
 ZOOKEEPER-237.patch


 It would be nice to be able to root ZooKeeper handles at specific points in 
 the namespace, so that applications that use ZooKeeper can work in their own 
 rooted subtree.
 For example, if ops decides that application X can use the subtree /apps/X 
 and application Y can use the subtree /apps/Y, X can to a chroot to /apps/X 
 and then all its path references can be rooted at /apps/X. Thus when X 
 creates the path /myid, it will actually be creating the path 
 /apps/X/myid.
 There are two ways we can expose this mechanism: 1) We can simply add a 
 chroot(String path) API, or 2) we can integrate into a service identifier 
 scheme for example zk://server1:2181,server2:2181/my/root. I like the second 
 form personally.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-438) addauth fails to register auth on new client that's not yet connected

2009-06-24 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-438:


Attachment: ZOOKEEPER-438.patch

 addauth fails to register auth on new client that's not yet connected
 -

 Key: ZOOKEEPER-438
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-438
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, java client
Reporter: Patrick Hunt
Assignee: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-438.patch, ZOOKEEPER-438.patch


 if addauth is called on a new client connection that's never connected to the 
 server, when the client does connect
 (syncconnected) the auth is not passed to the server. we should ensure we 
 addauth when the client connects or reconnects

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-438) addauth fails to register auth on new client that's not yet connected

2009-06-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-438:


Status: Open  (was: Patch Available)

 addauth fails to register auth on new client that's not yet connected
 -

 Key: ZOOKEEPER-438
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-438
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, java client
Reporter: Patrick Hunt
Assignee: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-438.patch, ZOOKEEPER-438.patch


 if addauth is called on a new client connection that's never connected to the 
 server, when the client does connect
 (syncconnected) the auth is not passed to the server. we should ensure we 
 addauth when the client connects or reconnects

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-438) addauth fails to register auth on new client that's not yet connected

2009-06-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-438:


Status: Patch Available  (was: Open)

 addauth fails to register auth on new client that's not yet connected
 -

 Key: ZOOKEEPER-438
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-438
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, java client
Reporter: Patrick Hunt
Assignee: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-438.patch, ZOOKEEPER-438.patch, 
 ZOOKEEPER-438.patch


 if addauth is called on a new client connection that's never connected to the 
 server, when the client does connect
 (syncconnected) the auth is not passed to the server. we should ensure we 
 addauth when the client connects or reconnects

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-438) addauth fails to register auth on new client that's not yet connected

2009-06-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-438:


Attachment: ZOOKEEPER-438.patch

slightly out of date

 addauth fails to register auth on new client that's not yet connected
 -

 Key: ZOOKEEPER-438
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-438
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, java client
Reporter: Patrick Hunt
Assignee: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-438.patch, ZOOKEEPER-438.patch, 
 ZOOKEEPER-438.patch


 if addauth is called on a new client connection that's never connected to the 
 server, when the client does connect
 (syncconnected) the auth is not passed to the server. we should ensure we 
 addauth when the client connects or reconnects

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-447) zkServer.sh doesn't allow different config files to be specified on the command line

2009-06-25 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724136#action_12724136
 ] 

Benjamin Reed commented on ZOOKEEPER-447:
-

+1 good idea

 zkServer.sh doesn't allow different config files to be specified on the 
 command line
 

 Key: ZOOKEEPER-447
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-447
 Project: Zookeeper
  Issue Type: Improvement
Affects Versions: 3.1.1, 3.2.0
Reporter: Henry Robinson
Assignee: Henry Robinson
Priority: Minor
 Attachments: ZOOKEEPER-447.patch


 Unless I'm missing something, you can change the directory that the zoo.cfg 
 file is in by setting ZOOCFGDIR but not the name of the file itself.
 I find it convenient myself to specify the config file on the command line, 
 but we should also let it be specified by environment variable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-417) stray message problem when changing servers

2009-06-25 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724309#action_12724309
 ] 

Benjamin Reed commented on ZOOKEEPER-417:
-

the release audit generates warnings before the way i added the new keeper 
exception codes. we made the integers deprecated, so i've made the new ones 
deprecated as well. should i put the integer in the new error code rather than 
added a new deprecated constant?

i can't find any failed tests in the test results. what am i missing?

 stray message problem when changing servers
 ---

 Key: ZOOKEEPER-417
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch


 There is  a possibility for stray messages from a previous connection to 
 violate ordering and generally cause problems. Here is a scenario: we have a 
 client, C, two followers, F1 and F2, and a leader, L. The client is connected 
 to F1, which is a slow follower. C sends setData(/a, 1) to F1 and then 
 loses the connection, so C reconnects to F2 and sends setData(/a, 2).  it 
 is possible, if F1 is slow enough and the setData(/a, 1) got onto the 
 network before the connection break, for F1 to forward the setData(/a, 1) 
 to L after F2 forwards setData(/a, 2).
 to fix this, the leader should keep track of which follower last registered a 
 session for a client and drop any requests from followers for clients for 
 whom they do not have a registration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers

2009-06-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-417:


Attachment: ZOOKEEPER-417.patch

 stray message problem when changing servers
 ---

 Key: ZOOKEEPER-417
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, 
 ZOOKEEPER-417.patch


 There is  a possibility for stray messages from a previous connection to 
 violate ordering and generally cause problems. Here is a scenario: we have a 
 client, C, two followers, F1 and F2, and a leader, L. The client is connected 
 to F1, which is a slow follower. C sends setData(/a, 1) to F1 and then 
 loses the connection, so C reconnects to F2 and sends setData(/a, 2).  it 
 is possible, if F1 is slow enough and the setData(/a, 1) got onto the 
 network before the connection break, for F1 to forward the setData(/a, 1) 
 to L after F2 forwards setData(/a, 2).
 to fix this, the leader should keep track of which follower last registered a 
 session for a client and drop any requests from followers for clients for 
 whom they do not have a registration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers

2009-06-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-417:


Status: Open  (was: Patch Available)

 stray message problem when changing servers
 ---

 Key: ZOOKEEPER-417
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, 
 ZOOKEEPER-417.patch


 There is  a possibility for stray messages from a previous connection to 
 violate ordering and generally cause problems. Here is a scenario: we have a 
 client, C, two followers, F1 and F2, and a leader, L. The client is connected 
 to F1, which is a slow follower. C sends setData(/a, 1) to F1 and then 
 loses the connection, so C reconnects to F2 and sends setData(/a, 2).  it 
 is possible, if F1 is slow enough and the setData(/a, 1) got onto the 
 network before the connection break, for F1 to forward the setData(/a, 1) 
 to L after F2 forwards setData(/a, 2).
 to fix this, the leader should keep track of which follower last registered a 
 session for a client and drop any requests from followers for clients for 
 whom they do not have a registration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers

2009-06-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-417:


Assignee: (was: Benjamin Reed)
  Status: Open  (was: Patch Available)

 stray message problem when changing servers
 ---

 Key: ZOOKEEPER-417
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, 
 ZOOKEEPER-417.patch, ZOOKEEPER-417.patch


 There is  a possibility for stray messages from a previous connection to 
 violate ordering and generally cause problems. Here is a scenario: we have a 
 client, C, two followers, F1 and F2, and a leader, L. The client is connected 
 to F1, which is a slow follower. C sends setData(/a, 1) to F1 and then 
 loses the connection, so C reconnects to F2 and sends setData(/a, 2).  it 
 is possible, if F1 is slow enough and the setData(/a, 1) got onto the 
 network before the connection break, for F1 to forward the setData(/a, 1) 
 to L after F2 forwards setData(/a, 2).
 to fix this, the leader should keep track of which follower last registered a 
 session for a client and drop any requests from followers for clients for 
 whom they do not have a registration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers

2009-06-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-417:


Attachment: ZOOKEEPER-417.patch

implemented mahadev's suggestion

 stray message problem when changing servers
 ---

 Key: ZOOKEEPER-417
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, 
 ZOOKEEPER-417.patch, ZOOKEEPER-417.patch


 There is  a possibility for stray messages from a previous connection to 
 violate ordering and generally cause problems. Here is a scenario: we have a 
 client, C, two followers, F1 and F2, and a leader, L. The client is connected 
 to F1, which is a slow follower. C sends setData(/a, 1) to F1 and then 
 loses the connection, so C reconnects to F2 and sends setData(/a, 2).  it 
 is possible, if F1 is slow enough and the setData(/a, 1) got onto the 
 network before the connection break, for F1 to forward the setData(/a, 1) 
 to L after F2 forwards setData(/a, 2).
 to fix this, the leader should keep track of which follower last registered a 
 session for a client and drop any requests from followers for clients for 
 whom they do not have a registration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers

2009-06-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-417:


Assignee: Benjamin Reed
  Status: Patch Available  (was: Open)

 stray message problem when changing servers
 ---

 Key: ZOOKEEPER-417
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, 
 ZOOKEEPER-417.patch, ZOOKEEPER-417.patch


 There is  a possibility for stray messages from a previous connection to 
 violate ordering and generally cause problems. Here is a scenario: we have a 
 client, C, two followers, F1 and F2, and a leader, L. The client is connected 
 to F1, which is a slow follower. C sends setData(/a, 1) to F1 and then 
 loses the connection, so C reconnects to F2 and sends setData(/a, 2).  it 
 is possible, if F1 is slow enough and the setData(/a, 1) got onto the 
 network before the connection break, for F1 to forward the setData(/a, 1) 
 to L after F2 forwards setData(/a, 2).
 to fix this, the leader should keep track of which follower last registered a 
 session for a client and drop any requests from followers for clients for 
 whom they do not have a registration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-448) png files do nto work with forrest.

2009-06-26 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724659#action_12724659
 ] 

Benjamin Reed commented on ZOOKEEPER-448:
-

+1

 png files do nto work with forrest.
 ---

 Key: ZOOKEEPER-448
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-448
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Mahadev konar
 Fix For: 3.2.0

 Attachments: 2pc.jpg, ZOOKEEPER-448.patch


 png images are not compatible with forrest generating pdf. We can them to jpg 
 to get them into pdfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-449) sesssionmoved in java code and ZCLOSING in C have the same value.

2009-06-26 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724716#action_12724716
 ] 

Benjamin Reed commented on ZOOKEEPER-449:
-

+1

 sesssionmoved in java code and ZCLOSING in C have the same value.
 -

 Key: ZOOKEEPER-449
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-449
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Mahadev konar
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-449.patch


 sesssionmoved in java code and ZCLOSING in C have the same value. We need to 
 assign a new value to ZSESSIONMOVED.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-450) emphemeral cleanup not happening with session timeout

2009-06-29 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-450:


Attachment: ZOOKEEPER-450.patch

the patch detects the bug and fixes it. i'm not completely sure about the fix. 
it's simple and works, but there is a little non deterministic corner case: a 
client issues a close, but the connection drops after the request is received 
by the server, and the client moves to a new server and continues to use the 
session, the stray close will come in and close the session. this corner case 
is not possible with our current client implementation.

 emphemeral cleanup not happening with session timeout
 -

 Key: ZOOKEEPER-450
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-450
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.2.0
Reporter: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-450.patch


 The session move patch broke ephemeral cleanup during session expiration. 
 tragically, we didn't have test coverage to detect the bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-450) emphemeral cleanup not happening with session timeout

2009-06-29 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-450:


Attachment: ZOOKEEPER-450.patch

updated the patch to comment on why the checkSession is not needed for the 
benefit of future maintainers.

 emphemeral cleanup not happening with session timeout
 -

 Key: ZOOKEEPER-450
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-450
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.2.0
Reporter: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-450.patch


 The session move patch broke ephemeral cleanup during session expiration. 
 tragically, we didn't have test coverage to detect the bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-450) emphemeral cleanup not happening with session timeout

2009-06-29 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-450:


Attachment: (was: ZOOKEEPER-450.patch)

 emphemeral cleanup not happening with session timeout
 -

 Key: ZOOKEEPER-450
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-450
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.2.0
Reporter: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-450.patch


 The session move patch broke ephemeral cleanup during session expiration. 
 tragically, we didn't have test coverage to detect the bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-450) emphemeral cleanup not happening with session timeout

2009-06-29 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-450:


Status: Patch Available  (was: Open)

 emphemeral cleanup not happening with session timeout
 -

 Key: ZOOKEEPER-450
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-450
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.2.0
Reporter: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0

 Attachments: ZOOKEEPER-450.patch


 The session move patch broke ephemeral cleanup during session expiration. 
 tragically, we didn't have test coverage to detect the bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-440) update the performance documentation in forrest

2009-06-29 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725360#action_12725360
 ] 

Benjamin Reed commented on ZOOKEEPER-440:
-

i have created the wiki page: 
http://wiki.apache.org/hadoop/ZooKeeper/Performance

i'd like to just leave it on the wiki for this release and move it to forrest 
when i can dedicate more time to the text and different benchmarks.

 update the performance documentation in forrest
 ---

 Key: ZOOKEEPER-440
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-440
 Project: Zookeeper
  Issue Type: Task
  Components: documentation
Reporter: Patrick Hunt
Assignee: Benjamin Reed
 Fix For: 3.2.0


 Ben, it would be great if you could update the performance documentation in 
 Forrest docs based on the 3.2 performance improvements.
 Specifically the scalling graphs (reads vs write load for various quorum 
 sizes)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-08 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728751#action_12728751
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

hey, henry two other questions/comments for you:

* i'm trying to understand the use case for a follower that connects as an 
observer. this would adversely affect the reliability of the system since a 
follower acting as an observer would count as a failed follower even though it 
is up. did you have a case in mind?
* i think it is reasonable to turn off the sync for the observer, but we 
probably still want to log to disk so that we can recover quickly. otherwise we 
will keep doing state transfers from the leader every time we connect. right?

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-14 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731073#action_12731073
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

to address the motivation a bit consider poorly connected data centers and 
cross datacenter zookeeper. we need to put zookeeper servers in the poorly 
connected data centers because we will want to service all the reads locally in 
those data centers, but we don't want to affect reliability or latency in other 
data centers. for example, imagine we have 5 poorly connected data centers and 
3 well connected data centers. we may put two servers in each data center. that 
means that we have an ensemble of 16 servers, but because of the poorly 
connected data centers, we are more likely to lose quorum than if we made the 5 
poorly connected data centers observers and just used the 3 well connected data 
centers to commit changes. you can view observers as proxies. 

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-423) Add getFirstChild API

2009-07-14 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731114#action_12731114
 ] 

Benjamin Reed commented on ZOOKEEPER-423:
-

we should keep in mind that someday we may have a partitioned namespace. when 
that happens some of these options would be hard/very expensive/blocking. NAME 
of course is easy. the client can always do this. when the creation happens, we 
can store the xid with the child's name in the parent data structure since it 
doesn't change, so CREATED is reasonable. MODIFIED and DATA_SIZE is more 
problematic/seemingly impossible in the presence of a namespace partition.

 Add getFirstChild API
 -

 Key: ZOOKEEPER-423
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-423
 Project: Zookeeper
  Issue Type: New Feature
  Components: contrib-bindings, documentation, java client, server
Reporter: Henry Robinson

 When building the distributed queue for my tutorial blog post, it was pointed 
 out to me that there's a serious inefficiency here. 
 Informally, the items in the queue are created as sequential nodes. For a 
 'dequeue' call, all items are retrieved and sorted by name by the client in 
 order to find the name of the next item to try and take. This costs O( n ) 
 bandwidth and O(n.log n) sorting time - per dequeue call! Clearly this 
 doesn't scale very well. 
 If the servers were able to maintain a data structure that allowed them to 
 efficiently retrieve the children of a node in order of the zxid that created 
 them this would make successful dequeue operations O( 1 ) at the cost of O( n 
 ) memory on the server (to maintain, e.g. a singly-linked list as a queue). 
 This is a win if it is generally true that clients only want the first child 
 in creation order, rather than the whole set. 
 We could expose this to the client via this API: getFirstChild(handle, path, 
 name_buffer, watcher) which would have much the same semantics as 
 getChildren, but only return one znode name. 
 Sequential nodes would still allow the ordering of znodes to be made 
 explicitly available to the client in one RPC should it need it. Although: 
 since this ordering would now be available cheaply for every set of children, 
 it's not completely clear that there would be that many use cases left for 
 sequential nodes if this API was augmented with a getChildrenInCreationOrder 
 call. However, that's for a different discussion. 
 A halfway-house alternative with more flexibility is to add an 'order' 
 parameter to getFirstChild and have the server compute the first child 
 according to the requested order (creation time, update time, lexicographical 
 order). This saves bandwidth at the expense of increased server load, 
 although servers can be implemented to spend memory on pre-computing commonly 
 requested orders. I am only in favour of this approach if servers maintain a 
 data-structure for every possible order, and then the memory implications 
 need careful consideration.
 [edit - JIRA interprets ( n ) without the spaces as a thumbs-down. cute.]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-472) Making DataNode not instantiate a HashMap when the node is ephmeral

2009-07-15 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731595#action_12731595
 ] 

Benjamin Reed commented on ZOOKEEPER-472:
-

i think we should expand this to not instantiate a hashmap for all zondes if 
there aren't any children. it creates a fixed size overhead for all leaf nodes 
and since there will always be more leaves than inner nodes, it is a none 
trivial space saving. i think it could also speed serialization/deserialization 
since it is faster to process a null then an empty hashmap. plus i think it 
keeps the code simpler to not have a new class.

 Making DataNode not instantiate a HashMap when the node is ephmeral
 ---

 Key: ZOOKEEPER-472
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-472
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.1.1, 3.2.0
Reporter: Erik Holstad
Assignee: Erik Holstad
Priority: Minor
 Fix For: 3.3.0


 Looking at the code, there is an overhead of a HashSet object for that nodes 
 children, even though the node might be an ephmeral node and cannot have 
 children.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-20 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733410#action_12733410
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

henry, i was thinking the other day that an observer is very similar to a 
follower in a flexible quorum with 0 weight. actually the more i thought about 
it, the more i realized that it should be the same. a follower with 0 weight 
really should not send ACKs back and then it would be an observer. it turns out 
that there is a comment in ZOOKEEPER-29 that makes this observation as well. in 
that issue the differences that flavio points out are no longer relevant. i 
think. what do you think?

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-21 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733789#action_12733789
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

i'm very sensitive to the work already done issue! i've totally been there.

the con argument for the increased chatter is actually quite minimal since the 
COMMIT message is just a few bytes that gets merged into an existing TCP 
stream.the restriction only weight-0 followers subscribing to a portion of the 
tree is a bit hacky, but it eliminates the need for a bunch of new code.

to be honest, there are two things that really concern me:

1) the amount of new code we have to add if we don't use weight-0 followers and 
the the new test cases that we have to write. since observers use a different 
code path we have to add a lot more tests.
2) one use of observers is to do graceful change over for ensemble changes. 
changing from a weight-0 follower to a follower that is a voting participant 
just means that the follower will start sending ACKs when it gets the proposal 
that it starts voting. we can do that very fast on the fly with no interruption 
to the follower. if we try to convert an observer, the new follower must switch 
from observer to follower and sync up to the leader before it can commit the 
new ensemble message. this increases the interruption of the change and the 
likelihood of failure.

btw, we could setup a phone conference if it would help. (everyone would be 
invited of course. we have global access numbers.)

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-21 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733790#action_12733790
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

hey i'm looking at the patch, can you comment on the VIEWCHANGE message? does 
that refer to ensemble membership change or the subscribe to a subtree that was 
mentioned.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-311) handle small path lengths in zoo_create()

2009-07-21 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-311:


Status: Patch Available  (was: Open)

 handle small path lengths in zoo_create()
 -

 Key: ZOOKEEPER-311
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-311
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.2.0, 3.1.1, 3.1.0, 3.0.1, 3.0.0
Reporter: Chris Darroch
Assignee: Chris Darroch
Priority: Minor
 Fix For: 3.2.1

 Attachments: ZOOKEEPER-311.patch, ZOOKEEPER-311.patch


 The synchronous completion for zoo_create() contains the following code:\\
 {noformat}
 if (sc-u.str.str_len  strlen(res.path)) {
 len = strlen(res.path);
 } else {
 len = sc-u.str.str_len-1;
 }
 if (len  0) {
 memcpy(sc-u.str.str, res.path, len);
 sc-u.str.str[len] = '\0';
 }
 {noformat}
 In the case where the max_realpath_len argument to zoo_create() is 0, none of 
 this code executes, which is OK.  In the case where max_realpath_len is 1, a 
 user might expect their buffer to be filled with a null terminator, but 
 again, nothing will happen (even if strlen(res.path) is 0, which is unlikely 
 since new node's will have paths longer than /).
 The name of the argument to zoo_create() is also a little misleading, as is 
 its description (the maximum length of real path you would want) in 
 zookeeper.h, and the example usage in the Programmer's Guide:
 {noformat}
 int rc = zoo_create(zh,/xyz,value, 5, CREATE_ONLY, ZOO_EPHEMERAL, 
 buffer, sizeof(buffer)-1);
 {noformat}
 In fact this value should be the actual length of the buffer, including space 
 for the null terminator.  If the user supplies a max_realpath_len of 10 and a 
 buffer of 11 bytes, and strlen(res.path) is 10, the code will truncate the 
 returned value to 9 bytes and put the null terminator in the second-last 
 byte, leaving the final byte of the buffer unused.
 It would be better, I think, to rename the realpath and max_realpath_len 
 arguments to something like path_buffer and path_buffer_len, akin to 
 zoo_set().  The path_buffer_len would be treated as the full length of the 
 buffer (as the code does now, in fact, but the docs suggest otherwise).
 The code in the synchronous completion could then be changed as per the 
 attached patch.
 Since this would change, slightly, the behaviour or contract of the API, I 
 would be inclined to suggest waiting until 4.0.0 to implement this change.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-311) handle small path lengths in zoo_create()

2009-07-21 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-311:


Status: Open  (was: Patch Available)

 handle small path lengths in zoo_create()
 -

 Key: ZOOKEEPER-311
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-311
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.2.0, 3.1.1, 3.1.0, 3.0.1, 3.0.0
Reporter: Chris Darroch
Assignee: Chris Darroch
Priority: Minor
 Fix For: 3.2.1

 Attachments: ZOOKEEPER-311.patch, ZOOKEEPER-311.patch


 The synchronous completion for zoo_create() contains the following code:\\
 {noformat}
 if (sc-u.str.str_len  strlen(res.path)) {
 len = strlen(res.path);
 } else {
 len = sc-u.str.str_len-1;
 }
 if (len  0) {
 memcpy(sc-u.str.str, res.path, len);
 sc-u.str.str[len] = '\0';
 }
 {noformat}
 In the case where the max_realpath_len argument to zoo_create() is 0, none of 
 this code executes, which is OK.  In the case where max_realpath_len is 1, a 
 user might expect their buffer to be filled with a null terminator, but 
 again, nothing will happen (even if strlen(res.path) is 0, which is unlikely 
 since new node's will have paths longer than /).
 The name of the argument to zoo_create() is also a little misleading, as is 
 its description (the maximum length of real path you would want) in 
 zookeeper.h, and the example usage in the Programmer's Guide:
 {noformat}
 int rc = zoo_create(zh,/xyz,value, 5, CREATE_ONLY, ZOO_EPHEMERAL, 
 buffer, sizeof(buffer)-1);
 {noformat}
 In fact this value should be the actual length of the buffer, including space 
 for the null terminator.  If the user supplies a max_realpath_len of 10 and a 
 buffer of 11 bytes, and strlen(res.path) is 10, the code will truncate the 
 returned value to 9 bytes and put the null terminator in the second-last 
 byte, leaving the final byte of the buffer unused.
 It would be better, I think, to rename the realpath and max_realpath_len 
 arguments to something like path_buffer and path_buffer_len, akin to 
 zoo_set().  The path_buffer_len would be treated as the full length of the 
 buffer (as the code does now, in fact, but the docs suggest otherwise).
 The code in the synchronous completion could then be changed as per the 
 attached patch.
 Since this would change, slightly, the behaviour or contract of the API, I 
 would be inclined to suggest waiting until 4.0.0 to implement this change.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.

2009-07-23 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-484:


Attachment: sessionTest.patch

this patch recreates the problem.

 Clients get SESSION MOVED exception when switching from follower to a leader.
 -

 Key: ZOOKEEPER-484
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.0
Reporter: Mahadev konar
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.2.1, 3.3.0

 Attachments: sessionTest.patch


 When a client is connected to follower and get disconnected and connects to a 
 leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new 
 feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO 
 NOT have this problem. The fix is to make sure the ownership of a connection 
 gets changed when a session moves from follower to the leader. The workaround 
 to it in 3.2.0 would be to swithc off connection from clients to the leader. 
 take a look at *leaderServers* java property in 
 http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-07-23 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Attachment: ZOOKEEPER-483.patch

i was able to reproduce the problem. and the patch was a missing catch for a 
socket exception.

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
 Fix For: 3.2.1

 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:34032]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d516011c NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected 

[jira] Updated: (ZOOKEEPER-466) crash on zookeeper_close() when using auth with empty cert

2009-07-30 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-466:


Status: Patch Available  (was: Open)

 crash on zookeeper_close() when using auth with empty cert
 --

 Key: ZOOKEEPER-466
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-466
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.2.0
Reporter: Chris Darroch
Assignee: Chris Darroch
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-466.patch


 The free_auth_info() function calls deallocate_Buffer(auth-auth) on every 
 element in the auth list; that function frees any memory pointed to by 
 auth-auth.buff if that field is non-NULL.
 In zoo_add_auth(), when certLen is zero (or cert is NULL), auth.buff is set 
 to 0, but then not assigned to authinfo-auth when auth.buff is NULL.  The 
 result is uninitialized data in auth-auth.buff in free_auth_info(), and 
 potential crashes.
 The attached patch adds a test which attempts to duplicate this error; it 
 works for me but may not always on all systems as it depends on the 
 uninitialized data being non-zero; there's not really a simple way I can see 
 to trigger this in the current test framework.  The patch also fixes the 
 problem, I believe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-466) crash on zookeeper_close() when using auth with empty cert

2009-07-30 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-466:


Status: Open  (was: Patch Available)

 crash on zookeeper_close() when using auth with empty cert
 --

 Key: ZOOKEEPER-466
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-466
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.2.0
Reporter: Chris Darroch
Assignee: Chris Darroch
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-466.patch


 The free_auth_info() function calls deallocate_Buffer(auth-auth) on every 
 element in the auth list; that function frees any memory pointed to by 
 auth-auth.buff if that field is non-NULL.
 In zoo_add_auth(), when certLen is zero (or cert is NULL), auth.buff is set 
 to 0, but then not assigned to authinfo-auth when auth.buff is NULL.  The 
 result is uninitialized data in auth-auth.buff in free_auth_info(), and 
 potential crashes.
 The attached patch adds a test which attempts to duplicate this error; it 
 works for me but may not always on all systems as it depends on the 
 uninitialized data being non-zero; there's not really a simple way I can see 
 to trigger this in the current test framework.  The patch also fixes the 
 problem, I believe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-05 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Attachment: ZOOKEEPER-483.patch

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:34032]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d516011c NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 

[jira] Commented: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-05 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739898#action_12739898
 ] 

Benjamin Reed commented on ZOOKEEPER-483:
-

I've addressed 1) in the attached patch.

for 2) we are not eating the IOException. we are actually shutting things down. 
the bug is actually that we are passing it up to the upper layer, which does 
not know anything about the follower thread. we need to handle it here.

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 

[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-05 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Status: Patch Available  (was: Open)

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:34032]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d516011c NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 

[jira] Updated: (ZOOKEEPER-311) handle small path lengths in zoo_create()

2009-08-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-311:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

commit to 3.2 branch: Committed revision 801756.
commit to trunk: Committed revision 801747.

 handle small path lengths in zoo_create()
 -

 Key: ZOOKEEPER-311
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-311
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.1.1, 3.2.0
Reporter: Chris Darroch
Assignee: Chris Darroch
Priority: Minor
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-311.patch, ZOOKEEPER-311.patch


 The synchronous completion for zoo_create() contains the following code:\\
 {noformat}
 if (sc-u.str.str_len  strlen(res.path)) {
 len = strlen(res.path);
 } else {
 len = sc-u.str.str_len-1;
 }
 if (len  0) {
 memcpy(sc-u.str.str, res.path, len);
 sc-u.str.str[len] = '\0';
 }
 {noformat}
 In the case where the max_realpath_len argument to zoo_create() is 0, none of 
 this code executes, which is OK.  In the case where max_realpath_len is 1, a 
 user might expect their buffer to be filled with a null terminator, but 
 again, nothing will happen (even if strlen(res.path) is 0, which is unlikely 
 since new node's will have paths longer than /).
 The name of the argument to zoo_create() is also a little misleading, as is 
 its description (the maximum length of real path you would want) in 
 zookeeper.h, and the example usage in the Programmer's Guide:
 {noformat}
 int rc = zoo_create(zh,/xyz,value, 5, CREATE_ONLY, ZOO_EPHEMERAL, 
 buffer, sizeof(buffer)-1);
 {noformat}
 In fact this value should be the actual length of the buffer, including space 
 for the null terminator.  If the user supplies a max_realpath_len of 10 and a 
 buffer of 11 bytes, and strlen(res.path) is 10, the code will truncate the 
 returned value to 9 bytes and put the null terminator in the second-last 
 byte, leaving the final byte of the buffer unused.
 It would be better, I think, to rename the realpath and max_realpath_len 
 arguments to something like path_buffer and path_buffer_len, akin to 
 zoo_set().  The path_buffer_len would be treated as the full length of the 
 buffer (as the code does now, in fact, but the docs suggest otherwise).
 The code in the synchronous completion could then be changed as per the 
 attached patch.
 Since this would change, slightly, the behaviour or contract of the API, I 
 would be inclined to suggest waiting until 4.0.0 to implement this change.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.

2009-08-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-484:


Hadoop Flags: [Reviewed]

+1 looks good mahadev

 Clients get SESSION MOVED exception when switching from follower to a leader.
 -

 Key: ZOOKEEPER-484
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.0
Reporter: Mahadev konar
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.2.1, 3.3.0

 Attachments: sessionTest.patch, ZOOKEEPER-484.patch


 When a client is connected to follower and get disconnected and connects to a 
 leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new 
 feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO 
 NOT have this problem. The fix is to make sure the ownership of a connection 
 gets changed when a session moves from follower to the leader. The workaround 
 to it in 3.2.0 would be to swithc off connection from clients to the leader. 
 take a look at *leaderServers* java property in 
 http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-490) the java docs for session creation are misleading/incomplete

2009-08-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-490:


Hadoop Flags: [Reviewed]

+1 looks good pat

 the java docs for session creation are misleading/incomplete
 

 Key: ZOOKEEPER-490
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-490
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1, 3.2.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-490.patch


 the javadoc for ZooKeeper constructor says:
  * The client object will pick an arbitrary server and try to connect to 
 it.
  * If failed, it will try the next one in the list, until a connection is
  * established, or all the servers have been tried.
 the or all server tried phrase is misleading, it should indicate that we 
 retry until success, con closed, or session expired. 
 we also need ot mention that connection is async, that constructor returns 
 immed and you need to look for connection event in watcher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-476) upgrade junit library from 4.4 to 4.6

2009-08-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-476:


Hadoop Flags: [Reviewed]

+1 looks good

 upgrade junit library from 4.4 to 4.6
 -

 Key: ZOOKEEPER-476
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-476
 Project: Zookeeper
  Issue Type: Improvement
  Components: tests
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Fix For: 3.3.0

 Attachments: junit-4.6.jar, junit-4.6.LICENSE.txt


 upgrade from junit 4.4 to 4.6

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-502) bookkeeper create call completion too many times

2009-08-06 Thread Benjamin Reed (JIRA)
bookkeeper create call completion too many times


 Key: ZOOKEEPER-502
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-502
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Flavio Paiva Junqueira


when calling the asynchronous version of create, the completion routine is 
called more than once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-502) bookkeeper create call completion too many times

2009-08-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-502:


Attachment: ZOOKEEPER-502.patch

this patch adds a test case that reproduces the problem.

 bookkeeper create call completion too many times
 

 Key: ZOOKEEPER-502
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-502
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Flavio Paiva Junqueira
 Attachments: ZOOKEEPER-502.patch


 when calling the asynchronous version of create, the completion routine is 
 called more than once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-502) bookkeeper create calls completion too many times

2009-08-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-502:


Summary: bookkeeper create calls completion too many times  (was: 
bookkeeper create call completion too many times)

 bookkeeper create calls completion too many times
 -

 Key: ZOOKEEPER-502
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-502
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Flavio Paiva Junqueira
 Attachments: ZOOKEEPER-502.patch


 when calling the asynchronous version of create, the completion routine is 
 called more than once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-502) bookkeeper create calls completion too many times

2009-08-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-502:


Component/s: contrib-bookkeeper

 bookkeeper create calls completion too many times
 -

 Key: ZOOKEEPER-502
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-502
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Reporter: Benjamin Reed
Assignee: Flavio Paiva Junqueira
 Attachments: ZOOKEEPER-502.patch


 when calling the asynchronous version of create, the completion routine is 
 called more than once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-503) race condition in asynchronous create

2009-08-06 Thread Benjamin Reed (JIRA)
race condition in asynchronous create
-

 Key: ZOOKEEPER-503
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-503
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Reporter: Benjamin Reed


there is a race condition between the zookeeper completion thread and the 
bookeeper processing queue during create. if the zookeeper completion thread 
falls behind due to scheduling, the action counter of the create operation may 
go backwards.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression

2009-08-08 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-499:


Status: Open  (was: Patch Available)

this looks good pat, but when you first get the logger, why are you using the 
package name? if you are going to use the package name shouldn't you get the 
package from the class file?

in the second test, you get the logger using a package to add an appender, but 
remove using the class. couldn't that cause a problem potentially?

 electionAlg should default to FLE (3) - regression
 --

 Key: ZOOKEEPER-499
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499
 Project: Zookeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.2.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Blocker
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-499.patch, ZOOKEEPER-499_br3.2.patch


 there's a regression in 3.2 - electionAlg is no longer defaulting to 3 
 (incorrectly defaults to 0)
 also - need to have tests to validate this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-08 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Attachment: ZOOKEEPER-483.patch

fixed patch to apply cleanly.

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, 
 ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:34032]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d516011c NIOServerCnxn: 
 

[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-10 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Status: Open  (was: Patch Available)

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, 
 ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:34032]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d516011c NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected 

[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-10 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Status: Patch Available  (was: Open)

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, 
 ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:34032]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d516011c NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected 

[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-10 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741605#action_12741605
 ] 

Benjamin Reed commented on ZOOKEEPER-498:
-

+1 looks good. when setting the stop flags, you should really do an interrupt 
to wake up the wait, but that will cause a message to be printed to stdout. 
i'll open another jira to fix that.

 Unending Leader Elections : WAN configuration
 -

 Key: ZOOKEEPER-498
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.2.0
 Environment: Each machine:
 CentOS 5.2 64-bit
 2GB ram
 java version 1.6.0_13
 Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
 Network Topology:
 DC : central data center
 POD(N): remote data center
 Zookeeper Topology:
 Leaders may be elected only in DC (weight = 1)
 Only followers are elected in PODS (weight = 0)
Reporter: Todd Greenwood-Geer
Assignee: Flavio Paiva Junqueira
Priority: Critical
 Fix For: 3.2.1, 3.3.0

 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
 zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, 
 ZOOKEEPER-498.patch, ZOOKEEPER-498.patch


 In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
 re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
 central DC group of ZK servers that have a voting weight = 1, and a group of 
 servers in remote pods with a voting weight of 0.
 What we expect to see is leaders elected only in the DC, and the pods to 
 contain only followers. What we are seeing is a continuous cycling of 
 leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
 patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-10 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-498:


Hadoop Flags: [Reviewed]

 Unending Leader Elections : WAN configuration
 -

 Key: ZOOKEEPER-498
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.2.0
 Environment: Each machine:
 CentOS 5.2 64-bit
 2GB ram
 java version 1.6.0_13
 Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
 Network Topology:
 DC : central data center
 POD(N): remote data center
 Zookeeper Topology:
 Leaders may be elected only in DC (weight = 1)
 Only followers are elected in PODS (weight = 0)
Reporter: Todd Greenwood-Geer
Assignee: Flavio Paiva Junqueira
Priority: Critical
 Fix For: 3.2.1, 3.3.0

 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
 zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, 
 ZOOKEEPER-498.patch, ZOOKEEPER-498.patch


 In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
 re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
 central DC group of ZK servers that have a voting weight = 1, and a group of 
 servers in remote pods with a voting weight of 0.
 What we expect to see is leaders elected only in the DC, and the pods to 
 contain only followers. What we are seeing is a continuous cycling of 
 leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
 patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-14 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Status: Open  (was: Patch Available)

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: QuorumTest.log, QuorumTest.log.gz, zklogs.tar.gz, 
 ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:34032]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d516011c NIOServerCnxn: 
 

[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-14 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Attachment: ZOOKEEPER-483.patch

The test case exposed another bug: log truncation was not being done properly 
with the buffered inputstream. i modified the test to make it fail reliably and 
then fixed the bug.

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: QuorumTest.log, QuorumTest.log.gz, zklogs.tar.gz, 
 ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, 
 ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 

[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-14 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Status: Patch Available  (was: Open)

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: QuorumTest.log, QuorumTest.log.gz, zklogs.tar.gz, 
 ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, 
 ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:34032]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d516011c 

[jira] Updated: (ZOOKEEPER-503) race condition in asynchronous create

2009-08-14 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-503:


Attachment: ZOOKEEPER-503.patch

this patch fixes a range of projects. it is a big simplification. it has a net 
removal of 700 lines of code. the meta data for a ledger was collapsed into a 
single znode. here is a description of the changes:

Index calculation in QuorumEngine must be synchronized on the LedgerHandle to 
avoid changes to the ensemble while trying to submit an operation. Such changes 
happen upon crashes of bookies. 
  

I initialized thought it was not necessary, but now I think this 
synchronization block is necessary. 

If a writer adds just a few entries to a ledger, it may end up with hints that 
say empty ledger when trying to recover a ledger. In this case, if we receive 
an empty ledger flag as a hint, we have to switch the hint to zero, which means 
that the client will start recovery from entry zero. If no entry has been 
written, it still works as the client won't be able to read anything.   
   

I have changed LedgerRecoveryTest to test for: many entries written, one entry 
written, no entry written.

I have been able to identify the problem that was causing BookieFailureTest to 
hang on Utkarsh's computer. Basically, when the queue of a BookieHandle is full 
and the corresponding bookie has crashed, we are not able to add a read 
operation to the queue incoming queue of the bookie handle because the 
BookieHandle is not processing new requests anymore and it is waiting to fail 
the handle. In this case, the BookieHandle throws an exception after timing out 
the call to add the read operation to the queue. We were propagating this 
exception to the application.   
  

The main problem is that we have to add the operation to the queue of 
ClientCBWorker so that we guarantee that it knows about the operation once we 
receive responses from bookies. If we throw an exception without removing the 
operation from the ClientCBWorker queue, the worker will wait forever, which I 
believe is the case Utkarsh was observing.  
   

If I reasoned about the code correctly, then my modifications fix this problem 
by retrying a few times and erroring out after a number of retries. Erroring 
out in this case means notifying the CBWorker so that we can release the 
operation. 

Fixing log level in LedgerConfig. -F

I have mainly worked on the ledger recovery machinery. I made it asynchronous 
by transforming LedgerRecovery into a thread and moving some calls. We have to 
revisit this way of making it asynchronous as it might not be acceptable for 
this patch.

I'm still to check why BookieFailureTest is failing for Utkarsh. It passes fine 
every time for me, so we have to find a way to reproduce it reliably in my 
machine so that I can debug it.


Took a pass over asynchronous ledger operations: create, open, close. Some 
parts are still blocking, work on those next.

 race condition in asynchronous create
 -

 Key: ZOOKEEPER-503
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-503
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Reporter: Benjamin Reed
 Attachments: ZOOKEEPER-503.patch


 there is a race condition between the zookeeper completion thread and the 
 bookeeper processing queue during create. if the zookeeper completion thread 
 falls behind due to scheduling, the action counter of the create operation 
 may go backwards.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-503) race condition in asynchronous create

2009-08-14 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-503:


Status: Patch Available  (was: Open)

 race condition in asynchronous create
 -

 Key: ZOOKEEPER-503
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-503
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Reporter: Benjamin Reed
 Attachments: ZOOKEEPER-503.patch


 there is a race condition between the zookeeper completion thread and the 
 bookeeper processing queue during create. if the zookeeper completion thread 
 falls behind due to scheduling, the action counter of the create operation 
 may go backwards.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-503) race condition in asynchronous create

2009-08-14 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743541#action_12743541
 ] 

Benjamin Reed commented on ZOOKEEPER-503:
-

i should have also mentioned that this patch was done by flavio and utkarsh. i 
will be reviewing it.

 race condition in asynchronous create
 -

 Key: ZOOKEEPER-503
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-503
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Reporter: Benjamin Reed
 Attachments: ZOOKEEPER-503.patch


 there is a race condition between the zookeeper completion thread and the 
 bookeeper processing queue during create. if the zookeeper completion thread 
 falls behind due to scheduling, the action counter of the create operation 
 may go backwards.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-14 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743547#action_12743547
 ] 

Benjamin Reed commented on ZOOKEEPER-483:
-

just to be clear. this bug isn't completely fixed and the test case should 
still be failing. i just want to make sure it fails reliably on the hudson 
machine.

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: QuorumTest.log, QuorumTest.log.gz, zklogs.tar.gz, 
 ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, 
 ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 

[jira] Updated: (ZOOKEEPER-508) proposals and commits for DIFF and Truncate messages from the leader to followers is buggy.

2009-08-19 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-508:


Attachment: ZOOKEEPER-508.patch

added a testcase for the DIFF problem. still not fixed.

 proposals and commits for DIFF and Truncate messages from the leader to 
 followers is buggy.
 ---

 Key: ZOOKEEPER-508
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-508
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Reporter: Mahadev konar
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-508.patch, ZOOKEEPER-508.patch


 The proposals and commits sent by the leader after it asks the followers to 
 truncate there logs or starts sending a diff has missing messages which 
 causes out of order commits messages and causes the followers to shutdown 
 because of these out of order commits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-508) proposals and commits for DIFF and Truncate messages from the leader to followers is buggy.

2009-08-24 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747155#action_12747155
 ] 

Benjamin Reed commented on ZOOKEEPER-508:
-

+1 looks good. simple fix! :)

 proposals and commits for DIFF and Truncate messages from the leader to 
 followers is buggy.
 ---

 Key: ZOOKEEPER-508
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-508
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Reporter: Mahadev konar
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-508.patch, ZOOKEEPER-508.patch, 
 ZOOKEEPER-508.patch, ZOOKEEPER-508.patch, ZOOKEEPER-508.patch, 
 ZOOKEEPER-508.patch-3.2


 The proposals and commits sent by the leader after it asks the followers to 
 truncate there logs or starts sending a diff has missing messages which 
 causes out of order commits messages and causes the followers to shutdown 
 because of these out of order commits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-515) Zookeeper quorum didn't provide service when restart after an Out of memory crash

2009-08-25 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747532#action_12747532
 ] 

Benjamin Reed commented on ZOOKEEPER-515:
-

first, it is important to note that our limit of 1M for data is a sanity check. 
it is unwise to design your application to run on the edge of sanity. generally 
we talk about data in the kilobyte range 100 bytes - 64k. zookeeper stores 
meta-data not application data.

do you know how big the resulting data is? what is the size of a snapshot file?

1) perhaps you are hitting the memory error again when you try to rebuild your 
in-memory data structure. you may try increasing the memory limit using the 
-Xmx flag.
2) there is a configuration option to specify the number of requests in flight, 
globalOutstandingLimit, which defaults to 1000, but with 1000 1M requests you 
need 1G for just the inflight requests, in addition to the memory needed for 
the tree. if you want to handle such large requests you need to look at the 
amount of memory we have and possibly tune that parameter. also if you have a 
large in memory tree and you need to do a state transfer for followers that are 
behind, you will need some time to push a lot of data over the network, so you 
probably also need to adjust the syncLimit and initLimit.
3) if you want to reinitialize everything you need to remove the version-2 
directory from all servers, otherwise, a server that still has the version-2 
directory will get elected and the other servers will sync with it.

 Zookeeper quorum didn't provide service when restart after an Out of memory 
 crash
 ---

 Key: ZOOKEEPER-515
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-515
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.0
 Environment: Linux 2.6.9-52bs-4core #2 SMP Wed Jan 16 14:44:08 EST 
 2008 x86_64 x86_64 x86_64 GNU/Linux
 Jdk: 1.6.0_14 
Reporter: Qian Ye

 The Zookeeper quorum, containing 5 servers, didn't provide service when 
 restart after an Out of memory crash. 
 It happened as following:
 1. we built  a Zookeeper quorum which contained  5 servers, say 1, 3, 4, 5, 6 
 (have no 2), and 6 was the leader.
 2. we created 18 threads on 6 different servers to set and get data from a 
 znode in the Zookeeper at the same time.  The size of the data is 1MB. The 
 test threads did their job as fast as possible, no pause between two 
 operation, and they repeated the setting and getting 4000 times. 
 3. the Zookeeper leader crashed about 10 mins  after the test threads 
 started. The leader printed out the log:
 2009-08-25 12:00:12,301 - WARN  [NIOServerCxn.Factory:2181:nioserverc...@497] 
 - Exception causing close of session 0x523
 4223c2dc00b5 due to java.io.IOException: Read error
 2009-08-25 12:00:12,318 - WARN  [NIOServerCxn.Factory:2181:nioserverc...@497] 
 - Exception causing close of session 0x523
 4223c2dc00b6 due to java.io.IOException: Read error
 2009-08-25 12:03:44,086 - WARN  [NIOServerCxn.Factory:2181:nioserverc...@497] 
 - Exception causing close of session 0x523
 4223c2dc00b8 due to java.io.IOException: Read error
 2009-08-25 12:04:53,757 - WARN  [NIOServerCxn.Factory:2181:nioserverc...@497] 
 - Exception causing close of session 0x523
 4223c2dc00b7 due to java.io.IOException: Read error
 2009-08-25 12:15:45,151 - FATAL [SyncThread:0:syncrequestproces...@131] - 
 Severe unrecoverable error, exiting
 java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2786)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71)
 at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
 at 
 org.apache.jute.BinaryOutputArchive.writeInt(BinaryOutputArchive.java:55)
 at org.apache.zookeeper.txn.SetDataTxn.serialize(SetDataTxn.java:42)
 at 
 org.apache.zookeeper.server.persistence.Util.marshallTxnEntry(Util.java:262)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:154)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:268)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:100)
 It is clear that the leader ran out of memory. then the server 4 was down 
 almost at the same time, and printed out the log:
 2009-08-25 12:15:45,995 - ERROR 
 [FollowerRequestProcessor:3:followerrequestproces...@91] - Unexpected 
 exception causing
 exit
 java.net.SocketException: Connection reset
 at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
 at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
 at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
 at 

[jira] Commented: (ZOOKEEPER-512) FLE election fails to elect leader

2009-08-26 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748018#action_12748018
 ] 

Benjamin Reed commented on ZOOKEEPER-512:
-

agreed. i think the problem is that under high load we don't have a period of 
error free operation. i think it is ok to generate errors randomly as we are 
doing, but we should have periods of error free operation so that things can 
settle down.

 FLE election fails to elect leader
 --

 Key: ZOOKEEPER-512
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-512
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.2.0
Reporter: Patrick Hunt
Assignee: Flavio Paiva Junqueira
Priority: Blocker
 Fix For: 3.2.1, 3.3.0

 Attachments: jst.txt, log3_debug.tar.gz, logs.tar.gz, logs2.tar.gz, 
 t5_aj.tar.gz, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch, 
 ZOOKEEPER-512.patch


 I was doing some fault injection testing of 3.2.1 with ZOOKEEPER-508 patch 
 applied and noticed that after some time the ensemble failed to re-elect a 
 leader.
 See the attached log files - 5 member ensemble. typically 5 is the leader
 Notice that after 16:23:50,525 no quorum is formed, even after 20 minutes 
 elapses w/no quorum
 environment:
 I was doing fault injection testing using aspectj. The faults are injected 
 into socketchannel read/write, I throw exceptions randomly at a 1/200 ratio 
 (rand.nextFloat() = .005 = throw IOException
 You can see when a fault is injected in the log via:
 2009-08-19 16:57:09,568 - INFO  [Thread-74:readrequestfailsintermitten...@38] 
 - READPACKET FORCED FAIL
 vs a read/write that didn't force fail:
 2009-08-19 16:57:09,568 - INFO  [Thread-74:readrequestfailsintermitten...@41] 
 - READPACKET OK
 otw standard code/config (straight fle quorum with 5 members)
 also see the attached jstack trace. this is for one of the servers. Notice in 
 particular that the number of sendworkers != the number of recv workers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-520) add static/readonly client resident serverless zookeeper

2009-09-08 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-520:


Summary: add static/readonly client resident serverless zookeeper  (was: 
add static/readonly client session type)

 add static/readonly client resident serverless zookeeper
 

 Key: ZOOKEEPER-520
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-520
 Project: Zookeeper
  Issue Type: New Feature
  Components: c client, java client
Reporter: Patrick Hunt
 Fix For: 3.3.0


 Occasionally people (typically ops) has asked for the ability to start a ZK 
 client with a hardcoded, local, non cluster based session. Meaning that you 
 can bring up a particular client with a hardcoded/readonly view of the ZK 
 namespace even if the zk cluster is not available. This seems useful for a 
 few reasons:
 1) unforseen problems - a client might be brought up and partial application 
 service restored even in the face of catastrophic cluster failure
 2) testing - client could be brought up with a hardcoded configuration for 
 testing purposes. we might even be able to extend this idea over time to 
 allow simulated changes ie - simulate other clients making changes in the 
 namespace, perhaps simulate changes in the state of the cluster (testing 
 state change is often hard for users of the client interface)
 Seems like this shouldn't be too hard for us to add. The session could be 
 established with a URI for a local/remote file rather than a URI of the 
 cluster servers. The client would essentially read this file which would be a 
 simple representation of the znode namespace.
 /foo/bar abc
 /foo/bar2 def
 etc...
 In the pure client readonly case this is simple. We might also want to allow 
 writes to the namespace (essentially back this with an in memory hash) for 
 things like group membership (so that the client continues to function).
 Obv this wouldn't work in some cases, but it might work in many and would 
 allow further options for users wrt building a relable/recoverable service on 
 top of ZK.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-542) c-client can spin when server unresponsive

2009-10-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-542:


Fix Version/s: 3.3.0
   Status: Patch Available  (was: Open)

 c-client can spin when server unresponsive
 --

 Key: ZOOKEEPER-542
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-542
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.2.0
Reporter: Christian Wiedmann
 Fix For: 3.3.0

 Attachments: ZOOKEEPER-542.patch, ZOOKEEPER-542.patch


 Due to a mismatch between zookeeper_interest() and zookeeper_process(), when 
 the zookeeper server is unresponsive the client can spin when reconnecting to 
 the server.
 In particular, zookeeper_interest() adds ZOOKEEPER_WRITE whenever there is 
 data to be sent, but flush_send_queue() only writes the data if the state is 
 ZOO_CONNECTED_STATE.  When in ZOO_ASSOCIATING_STATE, this results in spinning.
 This probably doesn't affect production, but I had a runaway process in a 
 development deployment that caused performance issues on the node.  This is 
 easy to reproduce in a single node environment by doing a kill -STOP on the 
 server and waiting for the session timeout.
 Patch to be added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



<    1   2   3   4   5   6   >