[jira] Updated: (ZOOKEEPER-434) the java shell should indicate connection status on command prompt
[ https://issues.apache.org/jira/browse/ZOOKEEPER-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-434: Resolution: Fixed Status: Resolved (was: Patch Available) Committed revision 782879. the java shell should indicate connection status on command prompt -- Key: ZOOKEEPER-434 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-434 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.1.1 Reporter: Patrick Hunt Assignee: Henry Robinson Priority: Minor Fix For: 3.2.0 Attachments: ZOOKEEPER-434.patch it would be very useful if the java shell showed the current connection status as part of the command prompt. this shows itself in particular for the following use case: I attempted to connect a java shell to a remote cluster that was unavailable, when I run the first command ls / on the cluster the shell hangs. It would be nice if the shell indicated connection status in the prompt and make it more clear that the shell is currently not connected. (it was hard to see the attempting to connect console message as it was lost in with the other messaes...) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (ZOOKEEPER-434) the java shell should indicate connection status on command prompt
[ https://issues.apache.org/jira/browse/ZOOKEEPER-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717559#action_12717559 ] Benjamin Reed edited comment on ZOOKEEPER-434 at 6/8/09 10:12 PM: -- Committed revision 782880. was (Author: breed): Committed revision 782879. the java shell should indicate connection status on command prompt -- Key: ZOOKEEPER-434 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-434 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.1.1 Reporter: Patrick Hunt Assignee: Henry Robinson Priority: Minor Fix For: 3.2.0 Attachments: ZOOKEEPER-434.patch it would be very useful if the java shell showed the current connection status as part of the command prompt. this shows itself in particular for the following use case: I attempted to connect a java shell to a remote cluster that was unavailable, when I run the first command ls / on the cluster the shell hangs. It would be nice if the shell indicated connection status in the prompt and make it more clear that the shell is currently not connected. (it was hard to see the attempting to connect console message as it was lost in with the other messaes...) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-435) allow super admin digest based auth to be configurable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-435: Resolution: Fixed Status: Resolved (was: Patch Available) Committed revision 782882. allow super admin digest based auth to be configurable Key: ZOOKEEPER-435 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-435 Project: Zookeeper Issue Type: Bug Components: server Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Critical Fix For: 3.2.0 Attachments: ZOOKEEPER-435.patch the server has a super digest based auth user that enables administrative access (ie has access to znodes regardless of acl settings) but the password is not configurable 1) make the default digest null, ie turn off super by default 2) if a command line option is specified when starting server then use the provided digest for super eg. java -Dzookeeper.DigestAuthenticationProvider.superDigest=xkxkxkxkx also this is not documented in the forrest docs - need to add that along with tests as part of the patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-336) single bad client can cause server to stop accepting connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717567#action_12717567 ] Benjamin Reed commented on ZOOKEEPER-336: - i forgot to add the new files. new files added in revision 782883. single bad client can cause server to stop accepting connections Key: ZOOKEEPER-336 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-336 Project: Zookeeper Issue Type: Improvement Components: c client, java client, server Reporter: Patrick Hunt Assignee: Henry Robinson Priority: Critical Fix For: 3.2.0 Attachments: ZOOKEEPER-336.patch, ZOOKEEPER-336.patch, ZOOKEEPER-336.patch, ZOOKEEPER-336.patch, ZOOKEEPER-336.patch, ZOOKEEPER-336.patch One user saw a case where a single mis-programmed client was overloading the server with connections - the client was creating a huge number of sessions to the server. This caused all of the fds on the server to become used. Seems like we should have some way of limiting (configurable override) the maximum number of sessions from a single client (say 10 by default?) Also we should output warnings when this limit is exceeded (or attempt to exceed). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-441) Zk-336 diff got applied twice to TestClientRetry.cc C test, causing compilation failure
[ https://issues.apache.org/jira/browse/ZOOKEEPER-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717734#action_12717734 ] Benjamin Reed commented on ZOOKEEPER-441: - thanx henry! i had a rough night last night. i missed this one. testing now. Zk-336 diff got applied twice to TestClientRetry.cc C test, causing compilation failure --- Key: ZOOKEEPER-441 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-441 Project: Zookeeper Issue Type: Bug Reporter: Henry Robinson Assignee: Henry Robinson Priority: Blocker Attachments: ZOOKEEPER-441.patch The latest version of trunk has a src/c/tests/TestClientRetry.cc file that has the actual file from ZK-336 appended to itself. This causes the compilation to fail due to lots of redeclaration errors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-356) Masking bookie failure during writes to a ledger
[ https://issues.apache.org/jira/browse/ZOOKEEPER-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-356: Status: Open (was: Patch Available) -1 wow you did a lot of work flavio. big patch, so i found a couple of problems. (some i might just be confused about.) it shouldn't be much to fix: * in LedgerOutputStream why are you interrupting the thread on BKExceptions * in the tests, why are you catching and just logging BKExceptions? shouldn't those make the tests fail? * i think _down_ should be volatile in BookieServer * why do you pass a BookieHandle to BookieClient * in BookKeeper you should probably catch NumberFormatException when you call Long.parseLong its one of those things are are really hard to debug if it happens * could you add a comment to the top of BookKeeper to explain how the different znodes are used? it will really help the next person * i think _stop_ and _incoming_ should be updated and read in the same synchronized block right? * in LedgerManager @return says getItem returns a long rather than String * are next and errorCounter used in ClientCB? very nice job on using the state machine to process the asynchronous calls! Masking bookie failure during writes to a ledger Key: ZOOKEEPER-356 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-356 Project: Zookeeper Issue Type: New Feature Components: contrib-bookkeeper Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Fix For: 3.2.0 Attachments: ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-BOOKKEEPER-356.patch The idea of this jira is to work out the changes necessary to make a client mask the failure of a bookie while writing to a ledger. I'm submitting a preliminary patch, but before I submit a final one, I need to have 288 committed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-442) need a way to remove watches that are no longer of interest
need a way to remove watches that are no longer of interest --- Key: ZOOKEEPER-442 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-442 Project: Zookeeper Issue Type: Improvement Reporter: Benjamin Reed currently the only way a watch cleared is to trigger it. we need a way to enumerate the outstanding watch objects, find watch events the objects are watching for, and remove interests in an event. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-442) need a way to remove watches that are no longer of interest
[ https://issues.apache.org/jira/browse/ZOOKEEPER-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718467#action_12718467 ] Benjamin Reed commented on ZOOKEEPER-442: - there are two problematic scenarios: 1) an application that has many transient interests can register a bunch of watches which wastes memory to monitor the watches (granted it is a very small amount of memory) and it cases unnecessary processing when those watches are triggered 2) applications need to be prepared to ignore watch events that they are no longer interested in. need a way to remove watches that are no longer of interest --- Key: ZOOKEEPER-442 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-442 Project: Zookeeper Issue Type: Improvement Reporter: Benjamin Reed currently the only way a watch cleared is to trigger it. we need a way to enumerate the outstanding watch objects, find watch events the objects are watching for, and remove interests in an event. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (ZOOKEEPER-443) trace logging in watch notification not wrapped with istraceneabled - inefficient
[ https://issues.apache.org/jira/browse/ZOOKEEPER-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719004#action_12719004 ] Benjamin Reed edited comment on ZOOKEEPER-443 at 6/12/09 3:22 PM: -- this looks good, but there something weird. ZooTrace also has an isTraceEnabled message. should we be using that? also {noformat} -ZooTrace.logRequest(LOG, traceMask, 'P', request, ); +if (LOG.isTraceEnabled()) { +ZooTrace.logRequest(LOG, traceMask, 'P', request, ); +} {noformat} doesn't really need the if does it? nothing is saved and the first thing ZooTrace.logRequest is going to do is call ZooTrace.isTraceEnabled. was (Author: breed): this looks good, but there something weird. ZooTrace also has an isTraceEnabled message. should we be using that? also {quote} -ZooTrace.logRequest(LOG, traceMask, 'P', request, ); +if (LOG.isTraceEnabled()) { +ZooTrace.logRequest(LOG, traceMask, 'P', request, ); +} {quote} doesn't really need the if does it? nothing is saved and the first thing ZooTrace.logRequest is going to do is call ZooTrace.isTraceEnabled. trace logging in watch notification not wrapped with istraceneabled - inefficient - Key: ZOOKEEPER-443 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-443 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Critical Fix For: 3.2.0 Attachments: ZOOKEEPER-443.patch In org.apache.zookeeper.server.NIOServerCnxn.process(WatchedEvent) there's a trace message that's not wrapped with isTraceEnabled, this is very inefficient and should be fixed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-443) trace logging in watch notification not wrapped with istraceneabled - inefficient
[ https://issues.apache.org/jira/browse/ZOOKEEPER-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719004#action_12719004 ] Benjamin Reed commented on ZOOKEEPER-443: - this looks good, but there something weird. ZooTrace also has an isTraceEnabled message. should we be using that? also {quote} -ZooTrace.logRequest(LOG, traceMask, 'P', request, ); +if (LOG.isTraceEnabled()) { +ZooTrace.logRequest(LOG, traceMask, 'P', request, ); +} {quote} doesn't really need the if does it? nothing is saved and the first thing ZooTrace.logRequest is going to do is call ZooTrace.isTraceEnabled. trace logging in watch notification not wrapped with istraceneabled - inefficient - Key: ZOOKEEPER-443 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-443 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Critical Fix For: 3.2.0 Attachments: ZOOKEEPER-443.patch In org.apache.zookeeper.server.NIOServerCnxn.process(WatchedEvent) there's a trace message that's not wrapped with isTraceEnabled, this is very inefficient and should be fixed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (ZOOKEEPER-443) trace logging in watch notification not wrapped with istraceneabled - inefficient
[ https://issues.apache.org/jira/browse/ZOOKEEPER-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719004#action_12719004 ] Benjamin Reed edited comment on ZOOKEEPER-443 at 6/12/09 3:23 PM: -- this looks good, but there something weird. ZooTrace also has an isTraceEnabled message. should we be using that? also {noformat} -ZooTrace.logRequest(LOG, traceMask, 'P', request, ); +if (LOG.isTraceEnabled()) { +ZooTrace.logRequest(LOG, traceMask, 'P', request, ); +} {noformat} doesn't really need the if does it? nothing is processing is saved since no new strings are being built and the first thing ZooTrace.logRequest is going to do is call ZooTrace.isTraceEnabled. was (Author: breed): this looks good, but there something weird. ZooTrace also has an isTraceEnabled message. should we be using that? also {noformat} -ZooTrace.logRequest(LOG, traceMask, 'P', request, ); +if (LOG.isTraceEnabled()) { +ZooTrace.logRequest(LOG, traceMask, 'P', request, ); +} {noformat} doesn't really need the if does it? nothing is saved and the first thing ZooTrace.logRequest is going to do is call ZooTrace.isTraceEnabled. trace logging in watch notification not wrapped with istraceneabled - inefficient - Key: ZOOKEEPER-443 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-443 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Critical Fix For: 3.2.0 Attachments: ZOOKEEPER-443.patch In org.apache.zookeeper.server.NIOServerCnxn.process(WatchedEvent) there's a trace message that's not wrapped with isTraceEnabled, this is very inefficient and should be fixed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-443) trace logging in watch notification not wrapped with istraceneabled - inefficient
[ https://issues.apache.org/jira/browse/ZOOKEEPER-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-443: Hadoop Flags: [Reviewed] +1 agreed trace logging in watch notification not wrapped with istraceneabled - inefficient - Key: ZOOKEEPER-443 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-443 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Critical Fix For: 3.2.0 Attachments: ZOOKEEPER-443.patch In org.apache.zookeeper.server.NIOServerCnxn.process(WatchedEvent) there's a trace message that's not wrapped with isTraceEnabled, this is very inefficient and should be fixed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719823#action_12719823 ] Benjamin Reed commented on ZOOKEEPER-107: - Raghu, i think henry is correct that you must get an ack from quorums in both the old and new views before committing the change. otherwise you get split brain which could result in multiple leaders. henry, i think we are thinking along the same lines, but i'm a bit skeptical of JOIN and LEAVE. in some sense they are a bit of an optimization that can be implemented with GETVIEW and NEWVIEW. it would be nice to make the mechanism as simple as possible. it also seems like you would also require a GETVIEW to be done before doing a NEWVIEW, just for sanity. (require an expected version on NEWVIEW and not allow a -1.) i was thinking that we would just push NEWVIEW through Zab making sure we get acks from quorums in both the old and new views. to help mitigate the case where proposing the NEWVIEW leads to a case where the system freezes up when the NEWVIEW proposal goes out and there isn't a quorum in the new view, the leader should probably make sure that it currently has quorum of followers in the new view before proposing the request. if it doesn't, it should error out the request. even with this we can still freeze up if we lose quorum in the new view after issuing the proposal, but that would happen anyway (as you point out), but it would prevent us from doing something that has no chance of working. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720323#action_12720323 ] Benjamin Reed commented on ZOOKEEPER-107: - just a caveat to my last comment. for point 1) we actually do need to touch the protocol code a bit to ensure that the setData that changes the view commits in both the old and new views. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720380#action_12720380 ] Benjamin Reed commented on ZOOKEEPER-107: - so if you do one at a time without using Zab, without working through the details 1) start with A, B, C, D 2) A is the leader and proposes LEAVE D and fails where only A and C get it. 3) B is the leader and proposes LEAVE C and fails where only B and D get it because of a complete power outage. 4) everything comes back up 5) A is elected leader by C 6) B is elected leader by D if we use ZAB split brain will not occur because we do not use the configuration until it has been committed. since it has been accepted by both the old and new quorums, we will eventually converge on the new configuration. (that is my conjecture, still needs to be proven) Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-444) perms definition for PERMS_ALL differ in C and java
[ https://issues.apache.org/jira/browse/ZOOKEEPER-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720774#action_12720774 ] Benjamin Reed commented on ZOOKEEPER-444: - +1 brilliant! perms definition for PERMS_ALL differ in C and java --- Key: ZOOKEEPER-444 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-444 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-444.patch the perms_all definition in Java is PERMS.ALL and does not include ADMIN perms but in c the PERMS_ALL def includes the ADMIN perms. We should make it consistent to include or not include the admin perms in both c and java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720790#action_12720790 ] Benjamin Reed commented on ZOOKEEPER-107: - oh right. you are correct. i guess it is more of a liveness/correctness issue: 1) start with A, B, C, D 2) B is down and A is the leader and proposes LEAVE C and fails where only D gets it. 3) C and D cannot get quorum since C has an older view. 4) D fails 5) A and B come back up and B is elected leader. 6) B proposes LEAVE A and C gets it before B fails. Now what happens? we cannot get quorum with just A and C since A has the old view. even if D comes up it will not elect C because it does not believe C is part of the ensemble. if they all come up either C or D can be elected leader, but if C is elected you end up with conflicting views: A thinks (B, C, D), B thinks (B, C, D), C thinks (B, C, D), and D thinks (A, B, D), so both A and D will effectively be out of the ensemble and you can't tolerate any failures. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-408) address all findbugs warnings in persistence classes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-408: +1 please commit. address all findbugs warnings in persistence classes Key: ZOOKEEPER-408 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-408 Project: Zookeeper Issue Type: Sub-task Reporter: Patrick Hunt Assignee: Mahadev konar Fix For: 3.2.0 Attachments: ZOOKEEPER-408.patch, ZOOKEEPER-408.patch, ZOOKEEPER-408.patch, ZOOKEEPER-408.patch, ZOOKEEPER-408.patch, ZOOKEEPER-408.patch trunk/src/java/main/org/apache/zookeeper/server/DataTree.java trunk/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java trunk/src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java trunk/src/java/main/org/apache/zookeeper/server/persistence/Util.java trunk/src/java/main/org/apache/zookeeper/server/DataNode.java trunk/src/java/main/org/apache/zookeeper/server/upgrade/DataNodeV1.java trunk/src/java/main/org/apache/zookeeper/server/upgrade/DataTreeV1.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-397) mainline tests conversion
[ https://issues.apache.org/jira/browse/ZOOKEEPER-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-397: Status: Open (was: Patch Available) patch doesn't apply mainline tests conversion - Key: ZOOKEEPER-397 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-397 Project: Zookeeper Issue Type: Sub-task Components: tests Reporter: Konstantin Boudnik Assignee: Konstantin Boudnik Fix For: 3.3.0 Attachments: testng-5.9-jdk15.jar, ZOOKEEPER-397.patch, ZOOKEEPER-397.patch, ZOOKEEPER-397.patch, ZOOKEEPER-397.patch, ZOOKEEPER-397.patch, ZOOKEEPER-397.patch, ZOOKEEPER-397.patch In this stage main set (src/java/test) of ZK tests will be converted to TestNG -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-427) ZooKeeper server unexpectedly high CPU utilisation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-427: Resolution: Fixed Status: Resolved (was: Patch Available) Committed revision 786251. ZooKeeper server unexpectedly high CPU utilisation -- Key: ZOOKEEPER-427 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-427 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Environment: Linux: 2.6.18-92.1.18.el5 #1 SMP Wed Nov 12 09:19:49 EST 2008 x86_64 x86_64 x86_64 GNU/Linux java version 1.6.0_03 Java(TM) SE Runtime Environment (build 1.6.0_03-b05) Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_03-b05, mixed mode) Reporter: Satish Bhatti Assignee: Flavio Paiva Junqueira Priority: Blocker Fix For: 3.2.0 Attachments: zk_quorum_recv_eof.patch, zoo.cfg, ZOOKEEPER-427.patch, zookeeper-jstack.log, zookeeper.log I am running a 5 node ZooKeeper cluster and I noticed that one of them has very high CPU usage: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 6883 infact 22 0 725m 41m 4188 S 95 0.5 5671:54 java It is not doing anything application-wise at this point, so I was wondering why the heck it's using up so much CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-422) Java CLI should support ephemeral and sequential node creation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-422: Resolution: Fixed Status: Resolved (was: Patch Available) Fixed the usage string thanx henry! Committed revision 786317. Java CLI should support ephemeral and sequential node creation -- Key: ZOOKEEPER-422 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-422 Project: Zookeeper Issue Type: Improvement Affects Versions: 3.2.0 Reporter: Henry Robinson Assignee: Henry Robinson Priority: Minor Fix For: 3.2.0 Attachments: ZOOKEEPER-422.patch The C client supports creation of ephemeral and sequential nodes. For feature parity, so should the Java CLI. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-329) document how to integrate 3rd party authentication into ZK server ACLs
[ https://issues.apache.org/jira/browse/ZOOKEEPER-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-329: Status: Patch Available (was: Open) document how to integrate 3rd party authentication into ZK server ACLs -- Key: ZOOKEEPER-329 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-329 Project: Zookeeper Issue Type: Improvement Components: documentation Reporter: Patrick Hunt Assignee: Benjamin Reed Priority: Minor Fix For: 3.2.0 Attachments: plugauth.pdf, ZOOKEEPER-329.patch the docs mention that zk supports pluggable auth schemes but doesn't detail the API/examples. We should add this to the docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-329) document how to integrate 3rd party authentication into ZK server ACLs
[ https://issues.apache.org/jira/browse/ZOOKEEPER-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-329: Attachment: ZOOKEEPER-329.patch plugauth.pdf I'm attaching a pdf of the relevant section to ease review. document how to integrate 3rd party authentication into ZK server ACLs -- Key: ZOOKEEPER-329 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-329 Project: Zookeeper Issue Type: Improvement Components: documentation Reporter: Patrick Hunt Assignee: Benjamin Reed Priority: Minor Fix For: 3.2.0 Attachments: plugauth.pdf, ZOOKEEPER-329.patch the docs mention that zk supports pluggable auth schemes but doesn't detail the API/examples. We should add this to the docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-224) Deploy ZooKeeper 3.2.0 to a Maven Repository
[ https://issues.apache.org/jira/browse/ZOOKEEPER-224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-224: Fix Version/s: 3.2.0 Summary: Deploy ZooKeeper 3.2.0 to a Maven Repository (was: Deploy ZooKeeper 3.0.0 to a Maven Repository) In the next release can we get zookeeper.jar zookeeper-test.jar and bookeeper.jar published to maven? is there some simple procedure to apply to our built jar files to make them deployable? Deploy ZooKeeper 3.2.0 to a Maven Repository Key: ZOOKEEPER-224 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-224 Project: Zookeeper Issue Type: Task Components: build Affects Versions: 3.0.0 Reporter: Hiram Chirino Assignee: Patrick Hunt Priority: Critical Fix For: 3.2.0 I've created the maven poms needed for the 3.0.0 release. The directory structure and artifacts located at: http://people.apache.org/~chirino/zk-repo/ aka people.apache.org:/x1/users/chirino/public_html/zk-repo Just need sto get GPG signed by the project KEY and deployed to: people.apache.org:/www/people.apache.org/repo/m2-ibiblio-rsync-repository Who's the current ZooKeeper release manager? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-224) Deploy ZooKeeper 3.2.0 to a Maven Repository
[ https://issues.apache.org/jira/browse/ZOOKEEPER-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722056#action_12722056 ] Benjamin Reed commented on ZOOKEEPER-224: - i think it is just a matter of running mvn deploy:deploy-file with the right flags right? i was thinking we would run it right after do do the release. Deploy ZooKeeper 3.2.0 to a Maven Repository Key: ZOOKEEPER-224 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-224 Project: Zookeeper Issue Type: Task Components: build Affects Versions: 3.0.0 Reporter: Hiram Chirino Assignee: Patrick Hunt Priority: Critical Fix For: 3.2.0 I've created the maven poms needed for the 3.0.0 release. The directory structure and artifacts located at: http://people.apache.org/~chirino/zk-repo/ aka people.apache.org:/x1/users/chirino/public_html/zk-repo Just need sto get GPG signed by the project KEY and deployed to: people.apache.org:/www/people.apache.org/repo/m2-ibiblio-rsync-repository Who's the current ZooKeeper release manager? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722063#action_12722063 ] Benjamin Reed commented on ZOOKEEPER-107: - i think if we use the notion of observers it helps: an observer can sync with a leader, but it doesn't get to vote. i think this makes it easy because the leader can then determine that it can commit with both the active followers and active observers if needed: for example start with A, B, C and move to A, B, D, E, F. if A and C are active followers and E and F are observers then the leader will propose the new configuration. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-446) some traces of the host auth scheme left
some traces of the host auth scheme left Key: ZOOKEEPER-446 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-446 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Fix For: 3.2.0 the host auth scheme was removed because it used a blocking call in an async pipeline. however, tragically, the blocking call was not removed including a couple of other stray classes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-446) some traces of the host auth scheme left
[ https://issues.apache.org/jira/browse/ZOOKEEPER-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-446: Attachment: ZOOKEEPER-446.patch some traces of the host auth scheme left Key: ZOOKEEPER-446 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-446 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Fix For: 3.2.0 Attachments: ZOOKEEPER-446.patch the host auth scheme was removed because it used a blocking call in an async pipeline. however, tragically, the blocking call was not removed including a couple of other stray classes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-356) Masking bookie failure during writes to a ledger
[ https://issues.apache.org/jira/browse/ZOOKEEPER-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723150#action_12723150 ] Benjamin Reed commented on ZOOKEEPER-356: - just a couple of things: * in BookieHandle, why doesn't stop get set to true on shutdown? * you need to check all your uses of LOG.info most of them seem to really be LOG.debug * in ClientCBWorker stop should be volatile * in LedgerHandle shouldn't add/removeBookie be synchronized? * in QuorumEngine should idCounter be synchronized? * In BookieClient you do a new IOException(), you should provide some hint of the problem in the constructor Masking bookie failure during writes to a ledger Key: ZOOKEEPER-356 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-356 Project: Zookeeper Issue Type: New Feature Components: contrib-bookkeeper Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Fix For: 3.2.0 Attachments: ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-BOOKKEEPER-356.patch The idea of this jira is to work out the changes necessary to make a client mask the failure of a bookie while writing to a ledger. I'm submitting a preliminary patch, but before I submit a final one, I need to have 288 committed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-224) Deploy ZooKeeper 3.2.0 to a Maven Repository
[ https://issues.apache.org/jira/browse/ZOOKEEPER-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723182#action_12723182 ] Benjamin Reed commented on ZOOKEEPER-224: - why do we need ivy? can't we just run the command outside the build process after we do the release? Deploy ZooKeeper 3.2.0 to a Maven Repository Key: ZOOKEEPER-224 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-224 Project: Zookeeper Issue Type: Task Components: build Affects Versions: 3.0.0 Reporter: Hiram Chirino Assignee: Patrick Hunt Priority: Critical Fix For: 3.2.0 I've created the maven poms needed for the 3.0.0 release. The directory structure and artifacts located at: http://people.apache.org/~chirino/zk-repo/ aka people.apache.org:/x1/users/chirino/public_html/zk-repo Just need sto get GPG signed by the project KEY and deployed to: people.apache.org:/www/people.apache.org/repo/m2-ibiblio-rsync-repository Who's the current ZooKeeper release manager? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-224) Deploy ZooKeeper 3.2.0 to a Maven Repository
[ https://issues.apache.org/jira/browse/ZOOKEEPER-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723247#action_12723247 ] Benjamin Reed commented on ZOOKEEPER-224: - sorry, i should have scoped my question better. i mean why do we need ivy to push our release jars into the repository?. i can see how we can use ivy for other needs, but for the specific issue of getting our jars into the maven repository, we can just run a command after we do the release. right? Deploy ZooKeeper 3.2.0 to a Maven Repository Key: ZOOKEEPER-224 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-224 Project: Zookeeper Issue Type: Task Components: build Affects Versions: 3.0.0 Reporter: Hiram Chirino Assignee: Patrick Hunt Priority: Critical Fix For: 3.2.0 I've created the maven poms needed for the 3.0.0 release. The directory structure and artifacts located at: http://people.apache.org/~chirino/zk-repo/ aka people.apache.org:/x1/users/chirino/public_html/zk-repo Just need sto get GPG signed by the project KEY and deployed to: people.apache.org:/www/people.apache.org/repo/m2-ibiblio-rsync-repository Who's the current ZooKeeper release manager? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-417: Attachment: ZOOKEEPER-417.patch stray message problem when changing servers --- Key: ZOOKEEPER-417 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-417.patch There is a possibility for stray messages from a previous connection to violate ordering and generally cause problems. Here is a scenario: we have a client, C, two followers, F1 and F2, and a leader, L. The client is connected to F1, which is a slow follower. C sends setData(/a, 1) to F1 and then loses the connection, so C reconnects to F2 and sends setData(/a, 2). it is possible, if F1 is slow enough and the setData(/a, 1) got onto the network before the connection break, for F1 to forward the setData(/a, 1) to L after F2 forwards setData(/a, 2). to fix this, the leader should keep track of which follower last registered a session for a client and drop any requests from followers for clients for whom they do not have a registration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-417: Status: Patch Available (was: Open) stray message problem when changing servers --- Key: ZOOKEEPER-417 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-417.patch There is a possibility for stray messages from a previous connection to violate ordering and generally cause problems. Here is a scenario: we have a client, C, two followers, F1 and F2, and a leader, L. The client is connected to F1, which is a slow follower. C sends setData(/a, 1) to F1 and then loses the connection, so C reconnects to F2 and sends setData(/a, 2). it is possible, if F1 is slow enough and the setData(/a, 1) got onto the network before the connection break, for F1 to forward the setData(/a, 1) to L after F2 forwards setData(/a, 2). to fix this, the leader should keep track of which follower last registered a session for a client and drop any requests from followers for clients for whom they do not have a registration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-356) Masking bookie failure during writes to a ledger
[ https://issues.apache.org/jira/browse/ZOOKEEPER-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-356: Issue Type: Improvement (was: New Feature) Masking bookie failure during writes to a ledger Key: ZOOKEEPER-356 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-356 Project: Zookeeper Issue Type: Improvement Components: contrib-bookkeeper Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Fix For: 3.2.0 Attachments: ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-BOOKKEEPER-356.patch The idea of this jira is to work out the changes necessary to make a client mask the failure of a bookie while writing to a ledger. I'm submitting a preliminary patch, but before I submit a final one, I need to have 288 committed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-356) Masking bookie failure during writes to a ledger
[ https://issues.apache.org/jira/browse/ZOOKEEPER-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-356: Resolution: Fixed Status: Resolved (was: Patch Available) Committed revision 787907. Masking bookie failure during writes to a ledger Key: ZOOKEEPER-356 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-356 Project: Zookeeper Issue Type: Improvement Components: contrib-bookkeeper Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Fix For: 3.2.0 Attachments: ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-356.patch, ZOOKEEPER-BOOKKEEPER-356.patch The idea of this jira is to work out the changes necessary to make a client mask the failure of a bookie while writing to a ledger. I'm submitting a preliminary patch, but before I submit a final one, I need to have 288 committed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-264) docs should include a state transition diagram for client state
[ https://issues.apache.org/jira/browse/ZOOKEEPER-264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-264: Status: Patch Available (was: Reopened) docs should include a state transition diagram for client state --- Key: ZOOKEEPER-264 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-264 Project: Zookeeper Issue Type: Improvement Components: documentation Affects Versions: 3.0.1, 3.0.0 Reporter: Patrick Hunt Assignee: Benjamin Reed Priority: Minor Fix For: 3.2.0 Attachments: state_dia.dia, state_dia.png, ZOOKEEPER-264.patch we should have a state transition diagram to help users understand client state transitions. perhaps the edges could indicate what might cause such a transition? (not sure if that will work). keep in mind for the states that the java/c clients have diff names for constants (not sure how to handle). This should be added to the programmer guide in the appropriate section. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (ZOOKEEPER-314) add wiki docs for bookeeper.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed resolved ZOOKEEPER-314. - Resolution: Fixed done: http://wiki.apache.org/hadoop/BookKeeper add wiki docs for bookeeper. Key: ZOOKEEPER-314 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-314 Project: Zookeeper Issue Type: Improvement Components: contrib-bookkeeper Affects Versions: 3.1.0 Reporter: Mahadev konar Assignee: Benjamin Reed Fix For: 3.2.0 we should have a wiki page for bookeeper for users to take a cursory look at what it is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-237) Add a Chroot request
[ https://issues.apache.org/jira/browse/ZOOKEEPER-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-237: Status: Open (was: Patch Available) looks good. two comments: * feel free to ignore this one: when you setup hostname and chroot, i think the code is simpler if you hostname = strdup(host) and then poke a null into hostname to strip off the chroot * we need to make sure we have total coverage for the testcases. you are missing a couple of the synchronous calls and you need to add the asynchronous calls. (i know it is tedious) Add a Chroot request Key: ZOOKEEPER-237 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-237 Project: Zookeeper Issue Type: New Feature Components: c client, java client Reporter: Benjamin Reed Assignee: Mahadev konar Priority: Minor Fix For: 3.2.0 Attachments: ZOOKEEPER-237.patch, ZOOKEEPER-237.patch, ZOOKEEPER-237.patch It would be nice to be able to root ZooKeeper handles at specific points in the namespace, so that applications that use ZooKeeper can work in their own rooted subtree. For example, if ops decides that application X can use the subtree /apps/X and application Y can use the subtree /apps/Y, X can to a chroot to /apps/X and then all its path references can be rooted at /apps/X. Thus when X creates the path /myid, it will actually be creating the path /apps/X/myid. There are two ways we can expose this mechanism: 1) We can simply add a chroot(String path) API, or 2) we can integrate into a service identifier scheme for example zk://server1:2181,server2:2181/my/root. I like the second form personally. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-438) addauth fails to register auth on new client that's not yet connected
[ https://issues.apache.org/jira/browse/ZOOKEEPER-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-438: Attachment: ZOOKEEPER-438.patch addauth fails to register auth on new client that's not yet connected - Key: ZOOKEEPER-438 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-438 Project: Zookeeper Issue Type: Bug Components: c client, java client Reporter: Patrick Hunt Assignee: Benjamin Reed Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-438.patch, ZOOKEEPER-438.patch if addauth is called on a new client connection that's never connected to the server, when the client does connect (syncconnected) the auth is not passed to the server. we should ensure we addauth when the client connects or reconnects -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-438) addauth fails to register auth on new client that's not yet connected
[ https://issues.apache.org/jira/browse/ZOOKEEPER-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-438: Status: Open (was: Patch Available) addauth fails to register auth on new client that's not yet connected - Key: ZOOKEEPER-438 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-438 Project: Zookeeper Issue Type: Bug Components: c client, java client Reporter: Patrick Hunt Assignee: Benjamin Reed Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-438.patch, ZOOKEEPER-438.patch if addauth is called on a new client connection that's never connected to the server, when the client does connect (syncconnected) the auth is not passed to the server. we should ensure we addauth when the client connects or reconnects -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-438) addauth fails to register auth on new client that's not yet connected
[ https://issues.apache.org/jira/browse/ZOOKEEPER-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-438: Status: Patch Available (was: Open) addauth fails to register auth on new client that's not yet connected - Key: ZOOKEEPER-438 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-438 Project: Zookeeper Issue Type: Bug Components: c client, java client Reporter: Patrick Hunt Assignee: Benjamin Reed Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-438.patch, ZOOKEEPER-438.patch, ZOOKEEPER-438.patch if addauth is called on a new client connection that's never connected to the server, when the client does connect (syncconnected) the auth is not passed to the server. we should ensure we addauth when the client connects or reconnects -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-438) addauth fails to register auth on new client that's not yet connected
[ https://issues.apache.org/jira/browse/ZOOKEEPER-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-438: Attachment: ZOOKEEPER-438.patch slightly out of date addauth fails to register auth on new client that's not yet connected - Key: ZOOKEEPER-438 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-438 Project: Zookeeper Issue Type: Bug Components: c client, java client Reporter: Patrick Hunt Assignee: Benjamin Reed Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-438.patch, ZOOKEEPER-438.patch, ZOOKEEPER-438.patch if addauth is called on a new client connection that's never connected to the server, when the client does connect (syncconnected) the auth is not passed to the server. we should ensure we addauth when the client connects or reconnects -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-447) zkServer.sh doesn't allow different config files to be specified on the command line
[ https://issues.apache.org/jira/browse/ZOOKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724136#action_12724136 ] Benjamin Reed commented on ZOOKEEPER-447: - +1 good idea zkServer.sh doesn't allow different config files to be specified on the command line Key: ZOOKEEPER-447 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-447 Project: Zookeeper Issue Type: Improvement Affects Versions: 3.1.1, 3.2.0 Reporter: Henry Robinson Assignee: Henry Robinson Priority: Minor Attachments: ZOOKEEPER-447.patch Unless I'm missing something, you can change the directory that the zoo.cfg file is in by setting ZOOCFGDIR but not the name of the file itself. I find it convenient myself to specify the config file on the command line, but we should also let it be specified by environment variable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-417) stray message problem when changing servers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724309#action_12724309 ] Benjamin Reed commented on ZOOKEEPER-417: - the release audit generates warnings before the way i added the new keeper exception codes. we made the integers deprecated, so i've made the new ones deprecated as well. should i put the integer in the new error code rather than added a new deprecated constant? i can't find any failed tests in the test results. what am i missing? stray message problem when changing servers --- Key: ZOOKEEPER-417 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch There is a possibility for stray messages from a previous connection to violate ordering and generally cause problems. Here is a scenario: we have a client, C, two followers, F1 and F2, and a leader, L. The client is connected to F1, which is a slow follower. C sends setData(/a, 1) to F1 and then loses the connection, so C reconnects to F2 and sends setData(/a, 2). it is possible, if F1 is slow enough and the setData(/a, 1) got onto the network before the connection break, for F1 to forward the setData(/a, 1) to L after F2 forwards setData(/a, 2). to fix this, the leader should keep track of which follower last registered a session for a client and drop any requests from followers for clients for whom they do not have a registration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-417: Attachment: ZOOKEEPER-417.patch stray message problem when changing servers --- Key: ZOOKEEPER-417 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, ZOOKEEPER-417.patch There is a possibility for stray messages from a previous connection to violate ordering and generally cause problems. Here is a scenario: we have a client, C, two followers, F1 and F2, and a leader, L. The client is connected to F1, which is a slow follower. C sends setData(/a, 1) to F1 and then loses the connection, so C reconnects to F2 and sends setData(/a, 2). it is possible, if F1 is slow enough and the setData(/a, 1) got onto the network before the connection break, for F1 to forward the setData(/a, 1) to L after F2 forwards setData(/a, 2). to fix this, the leader should keep track of which follower last registered a session for a client and drop any requests from followers for clients for whom they do not have a registration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-417: Status: Open (was: Patch Available) stray message problem when changing servers --- Key: ZOOKEEPER-417 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, ZOOKEEPER-417.patch There is a possibility for stray messages from a previous connection to violate ordering and generally cause problems. Here is a scenario: we have a client, C, two followers, F1 and F2, and a leader, L. The client is connected to F1, which is a slow follower. C sends setData(/a, 1) to F1 and then loses the connection, so C reconnects to F2 and sends setData(/a, 2). it is possible, if F1 is slow enough and the setData(/a, 1) got onto the network before the connection break, for F1 to forward the setData(/a, 1) to L after F2 forwards setData(/a, 2). to fix this, the leader should keep track of which follower last registered a session for a client and drop any requests from followers for clients for whom they do not have a registration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-417: Assignee: (was: Benjamin Reed) Status: Open (was: Patch Available) stray message problem when changing servers --- Key: ZOOKEEPER-417 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, ZOOKEEPER-417.patch There is a possibility for stray messages from a previous connection to violate ordering and generally cause problems. Here is a scenario: we have a client, C, two followers, F1 and F2, and a leader, L. The client is connected to F1, which is a slow follower. C sends setData(/a, 1) to F1 and then loses the connection, so C reconnects to F2 and sends setData(/a, 2). it is possible, if F1 is slow enough and the setData(/a, 1) got onto the network before the connection break, for F1 to forward the setData(/a, 1) to L after F2 forwards setData(/a, 2). to fix this, the leader should keep track of which follower last registered a session for a client and drop any requests from followers for clients for whom they do not have a registration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-417: Attachment: ZOOKEEPER-417.patch implemented mahadev's suggestion stray message problem when changing servers --- Key: ZOOKEEPER-417 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, ZOOKEEPER-417.patch There is a possibility for stray messages from a previous connection to violate ordering and generally cause problems. Here is a scenario: we have a client, C, two followers, F1 and F2, and a leader, L. The client is connected to F1, which is a slow follower. C sends setData(/a, 1) to F1 and then loses the connection, so C reconnects to F2 and sends setData(/a, 2). it is possible, if F1 is slow enough and the setData(/a, 1) got onto the network before the connection break, for F1 to forward the setData(/a, 1) to L after F2 forwards setData(/a, 2). to fix this, the leader should keep track of which follower last registered a session for a client and drop any requests from followers for clients for whom they do not have a registration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-417: Assignee: Benjamin Reed Status: Patch Available (was: Open) stray message problem when changing servers --- Key: ZOOKEEPER-417 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, ZOOKEEPER-417.patch There is a possibility for stray messages from a previous connection to violate ordering and generally cause problems. Here is a scenario: we have a client, C, two followers, F1 and F2, and a leader, L. The client is connected to F1, which is a slow follower. C sends setData(/a, 1) to F1 and then loses the connection, so C reconnects to F2 and sends setData(/a, 2). it is possible, if F1 is slow enough and the setData(/a, 1) got onto the network before the connection break, for F1 to forward the setData(/a, 1) to L after F2 forwards setData(/a, 2). to fix this, the leader should keep track of which follower last registered a session for a client and drop any requests from followers for clients for whom they do not have a registration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-448) png files do nto work with forrest.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724659#action_12724659 ] Benjamin Reed commented on ZOOKEEPER-448: - +1 png files do nto work with forrest. --- Key: ZOOKEEPER-448 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-448 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.2.0 Attachments: 2pc.jpg, ZOOKEEPER-448.patch png images are not compatible with forrest generating pdf. We can them to jpg to get them into pdfs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-449) sesssionmoved in java code and ZCLOSING in C have the same value.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724716#action_12724716 ] Benjamin Reed commented on ZOOKEEPER-449: - +1 sesssionmoved in java code and ZCLOSING in C have the same value. - Key: ZOOKEEPER-449 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-449 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.2.0 Attachments: ZOOKEEPER-449.patch sesssionmoved in java code and ZCLOSING in C have the same value. We need to assign a new value to ZSESSIONMOVED. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-450) emphemeral cleanup not happening with session timeout
[ https://issues.apache.org/jira/browse/ZOOKEEPER-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-450: Attachment: ZOOKEEPER-450.patch the patch detects the bug and fixes it. i'm not completely sure about the fix. it's simple and works, but there is a little non deterministic corner case: a client issues a close, but the connection drops after the request is received by the server, and the client moves to a new server and continues to use the session, the stray close will come in and close the session. this corner case is not possible with our current client implementation. emphemeral cleanup not happening with session timeout - Key: ZOOKEEPER-450 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-450 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.0 Reporter: Benjamin Reed Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-450.patch The session move patch broke ephemeral cleanup during session expiration. tragically, we didn't have test coverage to detect the bug. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-450) emphemeral cleanup not happening with session timeout
[ https://issues.apache.org/jira/browse/ZOOKEEPER-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-450: Attachment: ZOOKEEPER-450.patch updated the patch to comment on why the checkSession is not needed for the benefit of future maintainers. emphemeral cleanup not happening with session timeout - Key: ZOOKEEPER-450 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-450 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.0 Reporter: Benjamin Reed Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-450.patch The session move patch broke ephemeral cleanup during session expiration. tragically, we didn't have test coverage to detect the bug. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-450) emphemeral cleanup not happening with session timeout
[ https://issues.apache.org/jira/browse/ZOOKEEPER-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-450: Attachment: (was: ZOOKEEPER-450.patch) emphemeral cleanup not happening with session timeout - Key: ZOOKEEPER-450 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-450 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.0 Reporter: Benjamin Reed Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-450.patch The session move patch broke ephemeral cleanup during session expiration. tragically, we didn't have test coverage to detect the bug. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-450) emphemeral cleanup not happening with session timeout
[ https://issues.apache.org/jira/browse/ZOOKEEPER-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-450: Status: Patch Available (was: Open) emphemeral cleanup not happening with session timeout - Key: ZOOKEEPER-450 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-450 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.0 Reporter: Benjamin Reed Priority: Blocker Fix For: 3.2.0 Attachments: ZOOKEEPER-450.patch The session move patch broke ephemeral cleanup during session expiration. tragically, we didn't have test coverage to detect the bug. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-440) update the performance documentation in forrest
[ https://issues.apache.org/jira/browse/ZOOKEEPER-440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725360#action_12725360 ] Benjamin Reed commented on ZOOKEEPER-440: - i have created the wiki page: http://wiki.apache.org/hadoop/ZooKeeper/Performance i'd like to just leave it on the wiki for this release and move it to forrest when i can dedicate more time to the text and different benchmarks. update the performance documentation in forrest --- Key: ZOOKEEPER-440 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-440 Project: Zookeeper Issue Type: Task Components: documentation Reporter: Patrick Hunt Assignee: Benjamin Reed Fix For: 3.2.0 Ben, it would be great if you could update the performance documentation in Forrest docs based on the 3.2 performance improvements. Specifically the scalling graphs (reads vs write load for various quorum sizes) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-368) Observers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728751#action_12728751 ] Benjamin Reed commented on ZOOKEEPER-368: - hey, henry two other questions/comments for you: * i'm trying to understand the use case for a follower that connects as an observer. this would adversely affect the reliability of the system since a follower acting as an observer would count as a failed follower even though it is up. did you have a case in mind? * i think it is reasonable to turn off the sync for the observer, but we probably still want to log to disk so that we can recover quickly. otherwise we will keep doing state transfers from the leader every time we connect. right? Observers - Key: ZOOKEEPER-368 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368 Project: Zookeeper Issue Type: New Feature Components: quorum Reporter: Flavio Paiva Junqueira Assignee: Henry Robinson Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch Currently, all servers of an ensemble participate actively in reaching agreement on the order of ZooKeeper transactions. That is, all followers receive proposals, acknowledge them, and receive commit messages from the leader. A leader issues commit messages once it receives acknowledgments from a quorum of followers. For cross-colo operation, it would be useful to have a third role: observer. Using Paxos terminology, observers are similar to learners. An observer does not participate actively in the agreement step of the atomic broadcast protocol. Instead, it only commits proposals that have been accepted by some quorum of followers. One simple solution to implement observers is to have the leader forwarding commit messages not only to followers but also to observers, and have observers applying transactions according to the order followers agreed upon. In the current implementation of the protocol, however, commit messages do not carry their corresponding transaction payload because all servers different from the leader are followers and followers receive such a payload first through a proposal message. Just forwarding commit messages as they currently are to an observer consequently is not sufficient. We have a couple of options: 1- Include the transaction payload along in commit messages to observers; 2- Send proposals to observers as well. Number 2 is simpler to implement because it doesn't require changing the protocol implementation, but it increases traffic slightly. The performance impact due to such an increase might be insignificant, though. For scalability purposes, we may consider having followers also forwarding commit messages to observers. With this option, observers can connect to followers, and receive messages from followers. This choice is important to avoid increasing the load on the leader with the number of observers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-368) Observers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731073#action_12731073 ] Benjamin Reed commented on ZOOKEEPER-368: - to address the motivation a bit consider poorly connected data centers and cross datacenter zookeeper. we need to put zookeeper servers in the poorly connected data centers because we will want to service all the reads locally in those data centers, but we don't want to affect reliability or latency in other data centers. for example, imagine we have 5 poorly connected data centers and 3 well connected data centers. we may put two servers in each data center. that means that we have an ensemble of 16 servers, but because of the poorly connected data centers, we are more likely to lose quorum than if we made the 5 poorly connected data centers observers and just used the 3 well connected data centers to commit changes. you can view observers as proxies. Observers - Key: ZOOKEEPER-368 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368 Project: Zookeeper Issue Type: New Feature Components: quorum Reporter: Flavio Paiva Junqueira Assignee: Henry Robinson Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch Currently, all servers of an ensemble participate actively in reaching agreement on the order of ZooKeeper transactions. That is, all followers receive proposals, acknowledge them, and receive commit messages from the leader. A leader issues commit messages once it receives acknowledgments from a quorum of followers. For cross-colo operation, it would be useful to have a third role: observer. Using Paxos terminology, observers are similar to learners. An observer does not participate actively in the agreement step of the atomic broadcast protocol. Instead, it only commits proposals that have been accepted by some quorum of followers. One simple solution to implement observers is to have the leader forwarding commit messages not only to followers but also to observers, and have observers applying transactions according to the order followers agreed upon. In the current implementation of the protocol, however, commit messages do not carry their corresponding transaction payload because all servers different from the leader are followers and followers receive such a payload first through a proposal message. Just forwarding commit messages as they currently are to an observer consequently is not sufficient. We have a couple of options: 1- Include the transaction payload along in commit messages to observers; 2- Send proposals to observers as well. Number 2 is simpler to implement because it doesn't require changing the protocol implementation, but it increases traffic slightly. The performance impact due to such an increase might be insignificant, though. For scalability purposes, we may consider having followers also forwarding commit messages to observers. With this option, observers can connect to followers, and receive messages from followers. This choice is important to avoid increasing the load on the leader with the number of observers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-423) Add getFirstChild API
[ https://issues.apache.org/jira/browse/ZOOKEEPER-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731114#action_12731114 ] Benjamin Reed commented on ZOOKEEPER-423: - we should keep in mind that someday we may have a partitioned namespace. when that happens some of these options would be hard/very expensive/blocking. NAME of course is easy. the client can always do this. when the creation happens, we can store the xid with the child's name in the parent data structure since it doesn't change, so CREATED is reasonable. MODIFIED and DATA_SIZE is more problematic/seemingly impossible in the presence of a namespace partition. Add getFirstChild API - Key: ZOOKEEPER-423 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-423 Project: Zookeeper Issue Type: New Feature Components: contrib-bindings, documentation, java client, server Reporter: Henry Robinson When building the distributed queue for my tutorial blog post, it was pointed out to me that there's a serious inefficiency here. Informally, the items in the queue are created as sequential nodes. For a 'dequeue' call, all items are retrieved and sorted by name by the client in order to find the name of the next item to try and take. This costs O( n ) bandwidth and O(n.log n) sorting time - per dequeue call! Clearly this doesn't scale very well. If the servers were able to maintain a data structure that allowed them to efficiently retrieve the children of a node in order of the zxid that created them this would make successful dequeue operations O( 1 ) at the cost of O( n ) memory on the server (to maintain, e.g. a singly-linked list as a queue). This is a win if it is generally true that clients only want the first child in creation order, rather than the whole set. We could expose this to the client via this API: getFirstChild(handle, path, name_buffer, watcher) which would have much the same semantics as getChildren, but only return one znode name. Sequential nodes would still allow the ordering of znodes to be made explicitly available to the client in one RPC should it need it. Although: since this ordering would now be available cheaply for every set of children, it's not completely clear that there would be that many use cases left for sequential nodes if this API was augmented with a getChildrenInCreationOrder call. However, that's for a different discussion. A halfway-house alternative with more flexibility is to add an 'order' parameter to getFirstChild and have the server compute the first child according to the requested order (creation time, update time, lexicographical order). This saves bandwidth at the expense of increased server load, although servers can be implemented to spend memory on pre-computing commonly requested orders. I am only in favour of this approach if servers maintain a data-structure for every possible order, and then the memory implications need careful consideration. [edit - JIRA interprets ( n ) without the spaces as a thumbs-down. cute.] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-472) Making DataNode not instantiate a HashMap when the node is ephmeral
[ https://issues.apache.org/jira/browse/ZOOKEEPER-472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731595#action_12731595 ] Benjamin Reed commented on ZOOKEEPER-472: - i think we should expand this to not instantiate a hashmap for all zondes if there aren't any children. it creates a fixed size overhead for all leaf nodes and since there will always be more leaves than inner nodes, it is a none trivial space saving. i think it could also speed serialization/deserialization since it is faster to process a null then an empty hashmap. plus i think it keeps the code simpler to not have a new class. Making DataNode not instantiate a HashMap when the node is ephmeral --- Key: ZOOKEEPER-472 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-472 Project: Zookeeper Issue Type: Improvement Components: server Affects Versions: 3.1.1, 3.2.0 Reporter: Erik Holstad Assignee: Erik Holstad Priority: Minor Fix For: 3.3.0 Looking at the code, there is an overhead of a HashSet object for that nodes children, even though the node might be an ephmeral node and cannot have children. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-368) Observers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733410#action_12733410 ] Benjamin Reed commented on ZOOKEEPER-368: - henry, i was thinking the other day that an observer is very similar to a follower in a flexible quorum with 0 weight. actually the more i thought about it, the more i realized that it should be the same. a follower with 0 weight really should not send ACKs back and then it would be an observer. it turns out that there is a comment in ZOOKEEPER-29 that makes this observation as well. in that issue the differences that flavio points out are no longer relevant. i think. what do you think? Observers - Key: ZOOKEEPER-368 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368 Project: Zookeeper Issue Type: New Feature Components: quorum Reporter: Flavio Paiva Junqueira Assignee: Henry Robinson Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch Currently, all servers of an ensemble participate actively in reaching agreement on the order of ZooKeeper transactions. That is, all followers receive proposals, acknowledge them, and receive commit messages from the leader. A leader issues commit messages once it receives acknowledgments from a quorum of followers. For cross-colo operation, it would be useful to have a third role: observer. Using Paxos terminology, observers are similar to learners. An observer does not participate actively in the agreement step of the atomic broadcast protocol. Instead, it only commits proposals that have been accepted by some quorum of followers. One simple solution to implement observers is to have the leader forwarding commit messages not only to followers but also to observers, and have observers applying transactions according to the order followers agreed upon. In the current implementation of the protocol, however, commit messages do not carry their corresponding transaction payload because all servers different from the leader are followers and followers receive such a payload first through a proposal message. Just forwarding commit messages as they currently are to an observer consequently is not sufficient. We have a couple of options: 1- Include the transaction payload along in commit messages to observers; 2- Send proposals to observers as well. Number 2 is simpler to implement because it doesn't require changing the protocol implementation, but it increases traffic slightly. The performance impact due to such an increase might be insignificant, though. For scalability purposes, we may consider having followers also forwarding commit messages to observers. With this option, observers can connect to followers, and receive messages from followers. This choice is important to avoid increasing the load on the leader with the number of observers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-368) Observers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733789#action_12733789 ] Benjamin Reed commented on ZOOKEEPER-368: - i'm very sensitive to the work already done issue! i've totally been there. the con argument for the increased chatter is actually quite minimal since the COMMIT message is just a few bytes that gets merged into an existing TCP stream.the restriction only weight-0 followers subscribing to a portion of the tree is a bit hacky, but it eliminates the need for a bunch of new code. to be honest, there are two things that really concern me: 1) the amount of new code we have to add if we don't use weight-0 followers and the the new test cases that we have to write. since observers use a different code path we have to add a lot more tests. 2) one use of observers is to do graceful change over for ensemble changes. changing from a weight-0 follower to a follower that is a voting participant just means that the follower will start sending ACKs when it gets the proposal that it starts voting. we can do that very fast on the fly with no interruption to the follower. if we try to convert an observer, the new follower must switch from observer to follower and sync up to the leader before it can commit the new ensemble message. this increases the interruption of the change and the likelihood of failure. btw, we could setup a phone conference if it would help. (everyone would be invited of course. we have global access numbers.) Observers - Key: ZOOKEEPER-368 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368 Project: Zookeeper Issue Type: New Feature Components: quorum Reporter: Flavio Paiva Junqueira Assignee: Henry Robinson Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch Currently, all servers of an ensemble participate actively in reaching agreement on the order of ZooKeeper transactions. That is, all followers receive proposals, acknowledge them, and receive commit messages from the leader. A leader issues commit messages once it receives acknowledgments from a quorum of followers. For cross-colo operation, it would be useful to have a third role: observer. Using Paxos terminology, observers are similar to learners. An observer does not participate actively in the agreement step of the atomic broadcast protocol. Instead, it only commits proposals that have been accepted by some quorum of followers. One simple solution to implement observers is to have the leader forwarding commit messages not only to followers but also to observers, and have observers applying transactions according to the order followers agreed upon. In the current implementation of the protocol, however, commit messages do not carry their corresponding transaction payload because all servers different from the leader are followers and followers receive such a payload first through a proposal message. Just forwarding commit messages as they currently are to an observer consequently is not sufficient. We have a couple of options: 1- Include the transaction payload along in commit messages to observers; 2- Send proposals to observers as well. Number 2 is simpler to implement because it doesn't require changing the protocol implementation, but it increases traffic slightly. The performance impact due to such an increase might be insignificant, though. For scalability purposes, we may consider having followers also forwarding commit messages to observers. With this option, observers can connect to followers, and receive messages from followers. This choice is important to avoid increasing the load on the leader with the number of observers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-368) Observers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733790#action_12733790 ] Benjamin Reed commented on ZOOKEEPER-368: - hey i'm looking at the patch, can you comment on the VIEWCHANGE message? does that refer to ensemble membership change or the subscribe to a subtree that was mentioned. Observers - Key: ZOOKEEPER-368 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368 Project: Zookeeper Issue Type: New Feature Components: quorum Reporter: Flavio Paiva Junqueira Assignee: Henry Robinson Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch Currently, all servers of an ensemble participate actively in reaching agreement on the order of ZooKeeper transactions. That is, all followers receive proposals, acknowledge them, and receive commit messages from the leader. A leader issues commit messages once it receives acknowledgments from a quorum of followers. For cross-colo operation, it would be useful to have a third role: observer. Using Paxos terminology, observers are similar to learners. An observer does not participate actively in the agreement step of the atomic broadcast protocol. Instead, it only commits proposals that have been accepted by some quorum of followers. One simple solution to implement observers is to have the leader forwarding commit messages not only to followers but also to observers, and have observers applying transactions according to the order followers agreed upon. In the current implementation of the protocol, however, commit messages do not carry their corresponding transaction payload because all servers different from the leader are followers and followers receive such a payload first through a proposal message. Just forwarding commit messages as they currently are to an observer consequently is not sufficient. We have a couple of options: 1- Include the transaction payload along in commit messages to observers; 2- Send proposals to observers as well. Number 2 is simpler to implement because it doesn't require changing the protocol implementation, but it increases traffic slightly. The performance impact due to such an increase might be insignificant, though. For scalability purposes, we may consider having followers also forwarding commit messages to observers. With this option, observers can connect to followers, and receive messages from followers. This choice is important to avoid increasing the load on the leader with the number of observers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-311) handle small path lengths in zoo_create()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-311: Status: Patch Available (was: Open) handle small path lengths in zoo_create() - Key: ZOOKEEPER-311 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-311 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.2.0, 3.1.1, 3.1.0, 3.0.1, 3.0.0 Reporter: Chris Darroch Assignee: Chris Darroch Priority: Minor Fix For: 3.2.1 Attachments: ZOOKEEPER-311.patch, ZOOKEEPER-311.patch The synchronous completion for zoo_create() contains the following code:\\ {noformat} if (sc-u.str.str_len strlen(res.path)) { len = strlen(res.path); } else { len = sc-u.str.str_len-1; } if (len 0) { memcpy(sc-u.str.str, res.path, len); sc-u.str.str[len] = '\0'; } {noformat} In the case where the max_realpath_len argument to zoo_create() is 0, none of this code executes, which is OK. In the case where max_realpath_len is 1, a user might expect their buffer to be filled with a null terminator, but again, nothing will happen (even if strlen(res.path) is 0, which is unlikely since new node's will have paths longer than /). The name of the argument to zoo_create() is also a little misleading, as is its description (the maximum length of real path you would want) in zookeeper.h, and the example usage in the Programmer's Guide: {noformat} int rc = zoo_create(zh,/xyz,value, 5, CREATE_ONLY, ZOO_EPHEMERAL, buffer, sizeof(buffer)-1); {noformat} In fact this value should be the actual length of the buffer, including space for the null terminator. If the user supplies a max_realpath_len of 10 and a buffer of 11 bytes, and strlen(res.path) is 10, the code will truncate the returned value to 9 bytes and put the null terminator in the second-last byte, leaving the final byte of the buffer unused. It would be better, I think, to rename the realpath and max_realpath_len arguments to something like path_buffer and path_buffer_len, akin to zoo_set(). The path_buffer_len would be treated as the full length of the buffer (as the code does now, in fact, but the docs suggest otherwise). The code in the synchronous completion could then be changed as per the attached patch. Since this would change, slightly, the behaviour or contract of the API, I would be inclined to suggest waiting until 4.0.0 to implement this change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-311) handle small path lengths in zoo_create()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-311: Status: Open (was: Patch Available) handle small path lengths in zoo_create() - Key: ZOOKEEPER-311 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-311 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.2.0, 3.1.1, 3.1.0, 3.0.1, 3.0.0 Reporter: Chris Darroch Assignee: Chris Darroch Priority: Minor Fix For: 3.2.1 Attachments: ZOOKEEPER-311.patch, ZOOKEEPER-311.patch The synchronous completion for zoo_create() contains the following code:\\ {noformat} if (sc-u.str.str_len strlen(res.path)) { len = strlen(res.path); } else { len = sc-u.str.str_len-1; } if (len 0) { memcpy(sc-u.str.str, res.path, len); sc-u.str.str[len] = '\0'; } {noformat} In the case where the max_realpath_len argument to zoo_create() is 0, none of this code executes, which is OK. In the case where max_realpath_len is 1, a user might expect their buffer to be filled with a null terminator, but again, nothing will happen (even if strlen(res.path) is 0, which is unlikely since new node's will have paths longer than /). The name of the argument to zoo_create() is also a little misleading, as is its description (the maximum length of real path you would want) in zookeeper.h, and the example usage in the Programmer's Guide: {noformat} int rc = zoo_create(zh,/xyz,value, 5, CREATE_ONLY, ZOO_EPHEMERAL, buffer, sizeof(buffer)-1); {noformat} In fact this value should be the actual length of the buffer, including space for the null terminator. If the user supplies a max_realpath_len of 10 and a buffer of 11 bytes, and strlen(res.path) is 10, the code will truncate the returned value to 9 bytes and put the null terminator in the second-last byte, leaving the final byte of the buffer unused. It would be better, I think, to rename the realpath and max_realpath_len arguments to something like path_buffer and path_buffer_len, akin to zoo_set(). The path_buffer_len would be treated as the full length of the buffer (as the code does now, in fact, but the docs suggest otherwise). The code in the synchronous completion could then be changed as per the attached patch. Since this would change, slightly, the behaviour or contract of the API, I would be inclined to suggest waiting until 4.0.0 to implement this change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-484: Attachment: sessionTest.patch this patch recreates the problem. Clients get SESSION MOVED exception when switching from follower to a leader. - Key: ZOOKEEPER-484 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.0 Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.2.1, 3.3.0 Attachments: sessionTest.patch When a client is connected to follower and get disconnected and connects to a leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO NOT have this problem. The fix is to make sure the ownership of a connection gets changed when a session moves from follower to the leader. The workaround to it in 3.2.0 would be to swithc off connection from clients to the leader. take a look at *leaderServers* java property in http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-483: Attachment: ZOOKEEPER-483.patch i was able to reproduce the problem. and the patch was a missing catch for a socket exception. ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Fix For: 3.2.1 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:34032] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d516011c NIOServerCnxn: java.nio.channels.SocketChannel[connected
[jira] Updated: (ZOOKEEPER-466) crash on zookeeper_close() when using auth with empty cert
[ https://issues.apache.org/jira/browse/ZOOKEEPER-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-466: Status: Patch Available (was: Open) crash on zookeeper_close() when using auth with empty cert -- Key: ZOOKEEPER-466 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-466 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.2.0 Reporter: Chris Darroch Assignee: Chris Darroch Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-466.patch The free_auth_info() function calls deallocate_Buffer(auth-auth) on every element in the auth list; that function frees any memory pointed to by auth-auth.buff if that field is non-NULL. In zoo_add_auth(), when certLen is zero (or cert is NULL), auth.buff is set to 0, but then not assigned to authinfo-auth when auth.buff is NULL. The result is uninitialized data in auth-auth.buff in free_auth_info(), and potential crashes. The attached patch adds a test which attempts to duplicate this error; it works for me but may not always on all systems as it depends on the uninitialized data being non-zero; there's not really a simple way I can see to trigger this in the current test framework. The patch also fixes the problem, I believe. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-466) crash on zookeeper_close() when using auth with empty cert
[ https://issues.apache.org/jira/browse/ZOOKEEPER-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-466: Status: Open (was: Patch Available) crash on zookeeper_close() when using auth with empty cert -- Key: ZOOKEEPER-466 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-466 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.2.0 Reporter: Chris Darroch Assignee: Chris Darroch Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-466.patch The free_auth_info() function calls deallocate_Buffer(auth-auth) on every element in the auth list; that function frees any memory pointed to by auth-auth.buff if that field is non-NULL. In zoo_add_auth(), when certLen is zero (or cert is NULL), auth.buff is set to 0, but then not assigned to authinfo-auth when auth.buff is NULL. The result is uninitialized data in auth-auth.buff in free_auth_info(), and potential crashes. The attached patch adds a test which attempts to duplicate this error; it works for me but may not always on all systems as it depends on the uninitialized data being non-zero; there's not really a simple way I can see to trigger this in the current test framework. The patch also fixes the problem, I believe. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-483: Attachment: ZOOKEEPER-483.patch ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Assignee: Benjamin Reed Fix For: 3.2.1, 3.3.0 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:34032] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d516011c NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181
[jira] Commented: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739898#action_12739898 ] Benjamin Reed commented on ZOOKEEPER-483: - I've addressed 1) in the attached patch. for 2) we are not eating the IOException. we are actually shutting things down. the bug is actually that we are passing it up to the upper layer, which does not know anything about the follower thread. we need to handle it here. ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Assignee: Benjamin Reed Fix For: 3.2.1, 3.3.0 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn:
[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-483: Status: Patch Available (was: Open) ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Assignee: Benjamin Reed Fix For: 3.2.1, 3.3.0 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:34032] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d516011c NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181
[jira] Updated: (ZOOKEEPER-311) handle small path lengths in zoo_create()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-311: Resolution: Fixed Status: Resolved (was: Patch Available) commit to 3.2 branch: Committed revision 801756. commit to trunk: Committed revision 801747. handle small path lengths in zoo_create() - Key: ZOOKEEPER-311 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-311 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.1.1, 3.2.0 Reporter: Chris Darroch Assignee: Chris Darroch Priority: Minor Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-311.patch, ZOOKEEPER-311.patch The synchronous completion for zoo_create() contains the following code:\\ {noformat} if (sc-u.str.str_len strlen(res.path)) { len = strlen(res.path); } else { len = sc-u.str.str_len-1; } if (len 0) { memcpy(sc-u.str.str, res.path, len); sc-u.str.str[len] = '\0'; } {noformat} In the case where the max_realpath_len argument to zoo_create() is 0, none of this code executes, which is OK. In the case where max_realpath_len is 1, a user might expect their buffer to be filled with a null terminator, but again, nothing will happen (even if strlen(res.path) is 0, which is unlikely since new node's will have paths longer than /). The name of the argument to zoo_create() is also a little misleading, as is its description (the maximum length of real path you would want) in zookeeper.h, and the example usage in the Programmer's Guide: {noformat} int rc = zoo_create(zh,/xyz,value, 5, CREATE_ONLY, ZOO_EPHEMERAL, buffer, sizeof(buffer)-1); {noformat} In fact this value should be the actual length of the buffer, including space for the null terminator. If the user supplies a max_realpath_len of 10 and a buffer of 11 bytes, and strlen(res.path) is 10, the code will truncate the returned value to 9 bytes and put the null terminator in the second-last byte, leaving the final byte of the buffer unused. It would be better, I think, to rename the realpath and max_realpath_len arguments to something like path_buffer and path_buffer_len, akin to zoo_set(). The path_buffer_len would be treated as the full length of the buffer (as the code does now, in fact, but the docs suggest otherwise). The code in the synchronous completion could then be changed as per the attached patch. Since this would change, slightly, the behaviour or contract of the API, I would be inclined to suggest waiting until 4.0.0 to implement this change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-484: Hadoop Flags: [Reviewed] +1 looks good mahadev Clients get SESSION MOVED exception when switching from follower to a leader. - Key: ZOOKEEPER-484 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.0 Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.2.1, 3.3.0 Attachments: sessionTest.patch, ZOOKEEPER-484.patch When a client is connected to follower and get disconnected and connects to a leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO NOT have this problem. The fix is to make sure the ownership of a connection gets changed when a session moves from follower to the leader. The workaround to it in 3.2.0 would be to swithc off connection from clients to the leader. take a look at *leaderServers* java property in http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-490) the java docs for session creation are misleading/incomplete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-490: Hadoop Flags: [Reviewed] +1 looks good pat the java docs for session creation are misleading/incomplete Key: ZOOKEEPER-490 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-490 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1, 3.2.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-490.patch the javadoc for ZooKeeper constructor says: * The client object will pick an arbitrary server and try to connect to it. * If failed, it will try the next one in the list, until a connection is * established, or all the servers have been tried. the or all server tried phrase is misleading, it should indicate that we retry until success, con closed, or session expired. we also need ot mention that connection is async, that constructor returns immed and you need to look for connection event in watcher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-476) upgrade junit library from 4.4 to 4.6
[ https://issues.apache.org/jira/browse/ZOOKEEPER-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-476: Hadoop Flags: [Reviewed] +1 looks good upgrade junit library from 4.4 to 4.6 - Key: ZOOKEEPER-476 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-476 Project: Zookeeper Issue Type: Improvement Components: tests Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.3.0 Attachments: junit-4.6.jar, junit-4.6.LICENSE.txt upgrade from junit 4.4 to 4.6 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-502) bookkeeper create call completion too many times
bookkeeper create call completion too many times Key: ZOOKEEPER-502 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-502 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Flavio Paiva Junqueira when calling the asynchronous version of create, the completion routine is called more than once. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-502) bookkeeper create call completion too many times
[ https://issues.apache.org/jira/browse/ZOOKEEPER-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-502: Attachment: ZOOKEEPER-502.patch this patch adds a test case that reproduces the problem. bookkeeper create call completion too many times Key: ZOOKEEPER-502 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-502 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Flavio Paiva Junqueira Attachments: ZOOKEEPER-502.patch when calling the asynchronous version of create, the completion routine is called more than once. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-502) bookkeeper create calls completion too many times
[ https://issues.apache.org/jira/browse/ZOOKEEPER-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-502: Summary: bookkeeper create calls completion too many times (was: bookkeeper create call completion too many times) bookkeeper create calls completion too many times - Key: ZOOKEEPER-502 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-502 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Flavio Paiva Junqueira Attachments: ZOOKEEPER-502.patch when calling the asynchronous version of create, the completion routine is called more than once. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-502) bookkeeper create calls completion too many times
[ https://issues.apache.org/jira/browse/ZOOKEEPER-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-502: Component/s: contrib-bookkeeper bookkeeper create calls completion too many times - Key: ZOOKEEPER-502 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-502 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Reporter: Benjamin Reed Assignee: Flavio Paiva Junqueira Attachments: ZOOKEEPER-502.patch when calling the asynchronous version of create, the completion routine is called more than once. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-503) race condition in asynchronous create
race condition in asynchronous create - Key: ZOOKEEPER-503 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-503 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Reporter: Benjamin Reed there is a race condition between the zookeeper completion thread and the bookeeper processing queue during create. if the zookeeper completion thread falls behind due to scheduling, the action counter of the create operation may go backwards. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression
[ https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-499: Status: Open (was: Patch Available) this looks good pat, but when you first get the logger, why are you using the package name? if you are going to use the package name shouldn't you get the package from the class file? in the second test, you get the logger using a package to add an appender, but remove using the class. couldn't that cause a problem potentially? electionAlg should default to FLE (3) - regression -- Key: ZOOKEEPER-499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499 Project: Zookeeper Issue Type: Bug Components: server, tests Affects Versions: 3.2.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Blocker Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-499.patch, ZOOKEEPER-499_br3.2.patch there's a regression in 3.2 - electionAlg is no longer defaulting to 3 (incorrectly defaults to 0) also - need to have tests to validate this -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-483: Attachment: ZOOKEEPER-483.patch fixed patch to apply cleanly. ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Assignee: Benjamin Reed Fix For: 3.2.1, 3.3.0 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:34032] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d516011c NIOServerCnxn:
[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-483: Status: Open (was: Patch Available) ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Assignee: Benjamin Reed Fix For: 3.2.1, 3.3.0 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:34032] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d516011c NIOServerCnxn: java.nio.channels.SocketChannel[connected
[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-483: Status: Patch Available (was: Open) ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Assignee: Benjamin Reed Fix For: 3.2.1, 3.3.0 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:34032] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d516011c NIOServerCnxn: java.nio.channels.SocketChannel[connected
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741605#action_12741605 ] Benjamin Reed commented on ZOOKEEPER-498: - +1 looks good. when setting the stop flags, you should really do an interrupt to wake up the wait, but that will cause a message to be printed to stdout. i'll open another jira to fix that. Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-498: Hadoop Flags: [Reviewed] Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-483: Status: Open (was: Patch Available) ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Assignee: Benjamin Reed Fix For: 3.2.1, 3.3.0 Attachments: QuorumTest.log, QuorumTest.log.gz, zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:34032] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d516011c NIOServerCnxn:
[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-483: Attachment: ZOOKEEPER-483.patch The test case exposed another bug: log truncation was not being done properly with the buffered inputstream. i modified the test to make it fail reliably and then fixed the bug. ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Assignee: Benjamin Reed Fix For: 3.2.1, 3.3.0 Attachments: QuorumTest.log, QuorumTest.log.gz, zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn:
[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-483: Status: Patch Available (was: Open) ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Assignee: Benjamin Reed Fix For: 3.2.1, 3.3.0 Attachments: QuorumTest.log, QuorumTest.log.gz, zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:34032] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d516011c
[jira] Updated: (ZOOKEEPER-503) race condition in asynchronous create
[ https://issues.apache.org/jira/browse/ZOOKEEPER-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-503: Attachment: ZOOKEEPER-503.patch this patch fixes a range of projects. it is a big simplification. it has a net removal of 700 lines of code. the meta data for a ledger was collapsed into a single znode. here is a description of the changes: Index calculation in QuorumEngine must be synchronized on the LedgerHandle to avoid changes to the ensemble while trying to submit an operation. Such changes happen upon crashes of bookies. I initialized thought it was not necessary, but now I think this synchronization block is necessary. If a writer adds just a few entries to a ledger, it may end up with hints that say empty ledger when trying to recover a ledger. In this case, if we receive an empty ledger flag as a hint, we have to switch the hint to zero, which means that the client will start recovery from entry zero. If no entry has been written, it still works as the client won't be able to read anything. I have changed LedgerRecoveryTest to test for: many entries written, one entry written, no entry written. I have been able to identify the problem that was causing BookieFailureTest to hang on Utkarsh's computer. Basically, when the queue of a BookieHandle is full and the corresponding bookie has crashed, we are not able to add a read operation to the queue incoming queue of the bookie handle because the BookieHandle is not processing new requests anymore and it is waiting to fail the handle. In this case, the BookieHandle throws an exception after timing out the call to add the read operation to the queue. We were propagating this exception to the application. The main problem is that we have to add the operation to the queue of ClientCBWorker so that we guarantee that it knows about the operation once we receive responses from bookies. If we throw an exception without removing the operation from the ClientCBWorker queue, the worker will wait forever, which I believe is the case Utkarsh was observing. If I reasoned about the code correctly, then my modifications fix this problem by retrying a few times and erroring out after a number of retries. Erroring out in this case means notifying the CBWorker so that we can release the operation. Fixing log level in LedgerConfig. -F I have mainly worked on the ledger recovery machinery. I made it asynchronous by transforming LedgerRecovery into a thread and moving some calls. We have to revisit this way of making it asynchronous as it might not be acceptable for this patch. I'm still to check why BookieFailureTest is failing for Utkarsh. It passes fine every time for me, so we have to find a way to reproduce it reliably in my machine so that I can debug it. Took a pass over asynchronous ledger operations: create, open, close. Some parts are still blocking, work on those next. race condition in asynchronous create - Key: ZOOKEEPER-503 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-503 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Reporter: Benjamin Reed Attachments: ZOOKEEPER-503.patch there is a race condition between the zookeeper completion thread and the bookeeper processing queue during create. if the zookeeper completion thread falls behind due to scheduling, the action counter of the create operation may go backwards. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-503) race condition in asynchronous create
[ https://issues.apache.org/jira/browse/ZOOKEEPER-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-503: Status: Patch Available (was: Open) race condition in asynchronous create - Key: ZOOKEEPER-503 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-503 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Reporter: Benjamin Reed Attachments: ZOOKEEPER-503.patch there is a race condition between the zookeeper completion thread and the bookeeper processing queue during create. if the zookeeper completion thread falls behind due to scheduling, the action counter of the create operation may go backwards. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-503) race condition in asynchronous create
[ https://issues.apache.org/jira/browse/ZOOKEEPER-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743541#action_12743541 ] Benjamin Reed commented on ZOOKEEPER-503: - i should have also mentioned that this patch was done by flavio and utkarsh. i will be reviewing it. race condition in asynchronous create - Key: ZOOKEEPER-503 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-503 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Reporter: Benjamin Reed Attachments: ZOOKEEPER-503.patch there is a race condition between the zookeeper completion thread and the bookeeper processing queue during create. if the zookeeper completion thread falls behind due to scheduling, the action counter of the create operation may go backwards. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743547#action_12743547 ] Benjamin Reed commented on ZOOKEEPER-483: - just to be clear. this bug isn't completely fixed and the test case should still be failing. i just want to make sure it fails reliably on the hudson machine. ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Assignee: Benjamin Reed Fix For: 3.2.1, 3.3.0 Attachments: QuorumTest.log, QuorumTest.log.gz, zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn:
[jira] Updated: (ZOOKEEPER-508) proposals and commits for DIFF and Truncate messages from the leader to followers is buggy.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-508: Attachment: ZOOKEEPER-508.patch added a testcase for the DIFF problem. still not fixed. proposals and commits for DIFF and Truncate messages from the leader to followers is buggy. --- Key: ZOOKEEPER-508 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-508 Project: Zookeeper Issue Type: Bug Components: quorum Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-508.patch, ZOOKEEPER-508.patch The proposals and commits sent by the leader after it asks the followers to truncate there logs or starts sending a diff has missing messages which causes out of order commits messages and causes the followers to shutdown because of these out of order commits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-508) proposals and commits for DIFF and Truncate messages from the leader to followers is buggy.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747155#action_12747155 ] Benjamin Reed commented on ZOOKEEPER-508: - +1 looks good. simple fix! :) proposals and commits for DIFF and Truncate messages from the leader to followers is buggy. --- Key: ZOOKEEPER-508 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-508 Project: Zookeeper Issue Type: Bug Components: quorum Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-508.patch, ZOOKEEPER-508.patch, ZOOKEEPER-508.patch, ZOOKEEPER-508.patch, ZOOKEEPER-508.patch, ZOOKEEPER-508.patch-3.2 The proposals and commits sent by the leader after it asks the followers to truncate there logs or starts sending a diff has missing messages which causes out of order commits messages and causes the followers to shutdown because of these out of order commits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-515) Zookeeper quorum didn't provide service when restart after an Out of memory crash
[ https://issues.apache.org/jira/browse/ZOOKEEPER-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747532#action_12747532 ] Benjamin Reed commented on ZOOKEEPER-515: - first, it is important to note that our limit of 1M for data is a sanity check. it is unwise to design your application to run on the edge of sanity. generally we talk about data in the kilobyte range 100 bytes - 64k. zookeeper stores meta-data not application data. do you know how big the resulting data is? what is the size of a snapshot file? 1) perhaps you are hitting the memory error again when you try to rebuild your in-memory data structure. you may try increasing the memory limit using the -Xmx flag. 2) there is a configuration option to specify the number of requests in flight, globalOutstandingLimit, which defaults to 1000, but with 1000 1M requests you need 1G for just the inflight requests, in addition to the memory needed for the tree. if you want to handle such large requests you need to look at the amount of memory we have and possibly tune that parameter. also if you have a large in memory tree and you need to do a state transfer for followers that are behind, you will need some time to push a lot of data over the network, so you probably also need to adjust the syncLimit and initLimit. 3) if you want to reinitialize everything you need to remove the version-2 directory from all servers, otherwise, a server that still has the version-2 directory will get elected and the other servers will sync with it. Zookeeper quorum didn't provide service when restart after an Out of memory crash --- Key: ZOOKEEPER-515 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-515 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.0 Environment: Linux 2.6.9-52bs-4core #2 SMP Wed Jan 16 14:44:08 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Jdk: 1.6.0_14 Reporter: Qian Ye The Zookeeper quorum, containing 5 servers, didn't provide service when restart after an Out of memory crash. It happened as following: 1. we built a Zookeeper quorum which contained 5 servers, say 1, 3, 4, 5, 6 (have no 2), and 6 was the leader. 2. we created 18 threads on 6 different servers to set and get data from a znode in the Zookeeper at the same time. The size of the data is 1MB. The test threads did their job as fast as possible, no pause between two operation, and they repeated the setting and getting 4000 times. 3. the Zookeeper leader crashed about 10 mins after the test threads started. The leader printed out the log: 2009-08-25 12:00:12,301 - WARN [NIOServerCxn.Factory:2181:nioserverc...@497] - Exception causing close of session 0x523 4223c2dc00b5 due to java.io.IOException: Read error 2009-08-25 12:00:12,318 - WARN [NIOServerCxn.Factory:2181:nioserverc...@497] - Exception causing close of session 0x523 4223c2dc00b6 due to java.io.IOException: Read error 2009-08-25 12:03:44,086 - WARN [NIOServerCxn.Factory:2181:nioserverc...@497] - Exception causing close of session 0x523 4223c2dc00b8 due to java.io.IOException: Read error 2009-08-25 12:04:53,757 - WARN [NIOServerCxn.Factory:2181:nioserverc...@497] - Exception causing close of session 0x523 4223c2dc00b7 due to java.io.IOException: Read error 2009-08-25 12:15:45,151 - FATAL [SyncThread:0:syncrequestproces...@131] - Severe unrecoverable error, exiting java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71) at java.io.DataOutputStream.writeInt(DataOutputStream.java:180) at org.apache.jute.BinaryOutputArchive.writeInt(BinaryOutputArchive.java:55) at org.apache.zookeeper.txn.SetDataTxn.serialize(SetDataTxn.java:42) at org.apache.zookeeper.server.persistence.Util.marshallTxnEntry(Util.java:262) at org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:154) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:268) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:100) It is clear that the leader ran out of memory. then the server 4 was down almost at the same time, and printed out the log: 2009-08-25 12:15:45,995 - ERROR [FollowerRequestProcessor:3:followerrequestproces...@91] - Unexpected exception causing exit java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at
[jira] Commented: (ZOOKEEPER-512) FLE election fails to elect leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748018#action_12748018 ] Benjamin Reed commented on ZOOKEEPER-512: - agreed. i think the problem is that under high load we don't have a period of error free operation. i think it is ok to generate errors randomly as we are doing, but we should have periods of error free operation so that things can settle down. FLE election fails to elect leader -- Key: ZOOKEEPER-512 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-512 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.2.0 Reporter: Patrick Hunt Assignee: Flavio Paiva Junqueira Priority: Blocker Fix For: 3.2.1, 3.3.0 Attachments: jst.txt, log3_debug.tar.gz, logs.tar.gz, logs2.tar.gz, t5_aj.tar.gz, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch I was doing some fault injection testing of 3.2.1 with ZOOKEEPER-508 patch applied and noticed that after some time the ensemble failed to re-elect a leader. See the attached log files - 5 member ensemble. typically 5 is the leader Notice that after 16:23:50,525 no quorum is formed, even after 20 minutes elapses w/no quorum environment: I was doing fault injection testing using aspectj. The faults are injected into socketchannel read/write, I throw exceptions randomly at a 1/200 ratio (rand.nextFloat() = .005 = throw IOException You can see when a fault is injected in the log via: 2009-08-19 16:57:09,568 - INFO [Thread-74:readrequestfailsintermitten...@38] - READPACKET FORCED FAIL vs a read/write that didn't force fail: 2009-08-19 16:57:09,568 - INFO [Thread-74:readrequestfailsintermitten...@41] - READPACKET OK otw standard code/config (straight fle quorum with 5 members) also see the attached jstack trace. this is for one of the servers. Notice in particular that the number of sendworkers != the number of recv workers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-520) add static/readonly client resident serverless zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-520: Summary: add static/readonly client resident serverless zookeeper (was: add static/readonly client session type) add static/readonly client resident serverless zookeeper Key: ZOOKEEPER-520 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-520 Project: Zookeeper Issue Type: New Feature Components: c client, java client Reporter: Patrick Hunt Fix For: 3.3.0 Occasionally people (typically ops) has asked for the ability to start a ZK client with a hardcoded, local, non cluster based session. Meaning that you can bring up a particular client with a hardcoded/readonly view of the ZK namespace even if the zk cluster is not available. This seems useful for a few reasons: 1) unforseen problems - a client might be brought up and partial application service restored even in the face of catastrophic cluster failure 2) testing - client could be brought up with a hardcoded configuration for testing purposes. we might even be able to extend this idea over time to allow simulated changes ie - simulate other clients making changes in the namespace, perhaps simulate changes in the state of the cluster (testing state change is often hard for users of the client interface) Seems like this shouldn't be too hard for us to add. The session could be established with a URI for a local/remote file rather than a URI of the cluster servers. The client would essentially read this file which would be a simple representation of the znode namespace. /foo/bar abc /foo/bar2 def etc... In the pure client readonly case this is simple. We might also want to allow writes to the namespace (essentially back this with an in memory hash) for things like group membership (so that the client continues to function). Obv this wouldn't work in some cases, but it might work in many and would allow further options for users wrt building a relable/recoverable service on top of ZK. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-542) c-client can spin when server unresponsive
[ https://issues.apache.org/jira/browse/ZOOKEEPER-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-542: Fix Version/s: 3.3.0 Status: Patch Available (was: Open) c-client can spin when server unresponsive -- Key: ZOOKEEPER-542 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-542 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.2.0 Reporter: Christian Wiedmann Fix For: 3.3.0 Attachments: ZOOKEEPER-542.patch, ZOOKEEPER-542.patch Due to a mismatch between zookeeper_interest() and zookeeper_process(), when the zookeeper server is unresponsive the client can spin when reconnecting to the server. In particular, zookeeper_interest() adds ZOOKEEPER_WRITE whenever there is data to be sent, but flush_send_queue() only writes the data if the state is ZOO_CONNECTED_STATE. When in ZOO_ASSOCIATING_STATE, this results in spinning. This probably doesn't affect production, but I had a runaway process in a development deployment that caused performance issues on the node. This is easy to reproduce in a single node environment by doing a kill -STOP on the server and waiting for the session timeout. Patch to be added. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.