[jira] Commented: (ZOOKEEPER-921) zkPython incorrectly checks for existence of required ACL elements
[ https://issues.apache.org/jira/browse/ZOOKEEPER-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930008#action_12930008 ] Henry Robinson commented on ZOOKEEPER-921: -- Nicholas - Good catch, thanks! Do you think you will be able to submit a patch fixing the args checking in check_is_acl()? Thanks, Henry zkPython incorrectly checks for existence of required ACL elements -- Key: ZOOKEEPER-921 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-921 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.1, 3.4.0 Environment: Mac OS X 10.6.4, included Python 2.6.1 Reporter: Nicholas Knight Assignee: Nicholas Knight Fix For: 3.3.3, 3.4.0 Attachments: zktest.py Calling {{zookeeper.create()}} seems, under certain circumstances, to be corrupting a subsequent call to Python's {{logging}} module. Specifically, if the node does not exist (but its parent does), I end up with a traceback like this when I try to make the logging call: {noformat} Traceback (most recent call last): File zktest.py, line 21, in module logger.error(Boom?) File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py, line 1046, in error if self.isEnabledFor(ERROR): File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py, line 1206, in isEnabledFor return level = self.getEffectiveLevel() File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py, line 1194, in getEffectiveLevel while logger: TypeError: an integer is required {noformat} But if the node already exists, or the parent does not exist, I get the appropriate NodeExists or NoNode exceptions. I'll be attaching a test script that can be used to reproduce this behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-851) ZK lets any node to become an observer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925927#action_12925927 ] Henry Robinson commented on ZOOKEEPER-851: -- Hi Vishal - Sorry for the slow turnaround on this one. It doesn't surprise me that this is the behaviour, although it's slightly unexpected that the node becomes an observer, rather than a follower. What evidence do you have for that? (Given that Mode: follower - I haven't checked the code in a while, but I would have thought it would print Mode: Observer). Henry ZK lets any node to become an observer -- Key: ZOOKEEPER-851 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-851 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.3.1 Reporter: Vishal K Priority: Critical Fix For: 3.4.0 I had a 3 node cluster running. The zoo.cfg on each contained 3 entries as show below: tickTime=2000 dataDir=/var/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.0=10.150.27.61:2888:3888 server.1=10.150.27.62:2888:3888 server.2=10.150.27.63:2888:3888 I wanted to add another node to the cluster. In fourth node's zoo.cfg, I created another entry for that node and started zk server. The zoo.cfg on the first 3 nodes was left unchanged. The fourth node was able to join the cluster even though the 3 nodes had no idea about the fourth node. zoo.cfg on fourth node: tickTime=2000 dataDir=/var/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.0=10.150.27.61:2888:3888 server.1=10.150.27.62:2888:3888 server.2=10.150.27.63:2888:3888 server.3=10.17.117.71:2888:3888 It looks like 10.17.117.71 is becoming an observer in this case. I was expecting that the leader will reject 10.17.117.71. # telnet 10.17.117.71 2181 Trying 10.17.117.71... Connected to 10.17.117.71. Escape character is '^]'. stat Zookeeper version: 3.3.0--1, built on 04/02/2010 22:40 GMT Clients: /10.17.117.71:37297[1](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 3 Sent: 2 Outstanding: 0 Zxid: 0x20065 Mode: follower Node count: 288 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-851) ZK lets any node to become an observer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926016#action_12926016 ] Henry Robinson commented on ZOOKEEPER-851: -- I think what happens is that the leader happily lets the new follower connect, but that it won't be part of any voting procedure. It shouldn't become leader because no other nodes know about it to propose or support a vote for it. To add a new node, you'll need to incrementally restart every node in your cluster with the new config. ZK lets any node to become an observer -- Key: ZOOKEEPER-851 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-851 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.3.1 Reporter: Vishal K Priority: Critical Fix For: 3.4.0 I had a 3 node cluster running. The zoo.cfg on each contained 3 entries as show below: tickTime=2000 dataDir=/var/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.0=10.150.27.61:2888:3888 server.1=10.150.27.62:2888:3888 server.2=10.150.27.63:2888:3888 I wanted to add another node to the cluster. In fourth node's zoo.cfg, I created another entry for that node and started zk server. The zoo.cfg on the first 3 nodes was left unchanged. The fourth node was able to join the cluster even though the 3 nodes had no idea about the fourth node. zoo.cfg on fourth node: tickTime=2000 dataDir=/var/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.0=10.150.27.61:2888:3888 server.1=10.150.27.62:2888:3888 server.2=10.150.27.63:2888:3888 server.3=10.17.117.71:2888:3888 It looks like 10.17.117.71 is becoming an observer in this case. I was expecting that the leader will reject 10.17.117.71. # telnet 10.17.117.71 2181 Trying 10.17.117.71... Connected to 10.17.117.71. Escape character is '^]'. stat Zookeeper version: 3.3.0--1, built on 04/02/2010 22:40 GMT Clients: /10.17.117.71:37297[1](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 3 Sent: 2 Outstanding: 0 Zxid: 0x20065 Mode: follower Node count: 288 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Apache now has reviewboard
Yes! On 25 October 2010 22:47, Patrick Hunt ph...@apache.org wrote: And we're on it: https://reviews.apache.org/groups/zookeeper/ https://reviews.apache.org/groups/zookeeper/We should rework our howtocommit to incorporate this. Patrick On Mon, Oct 25, 2010 at 10:16 PM, Patrick Hunt ph...@apache.org wrote: FYI: https://blogs.apache.org/infra/entry/reviewboard_instance_running_at_the We should start using this, I've used it for other projects and it worked out quite well. Patrick -- Henry Robinson Software Engineer Cloudera 415-994-6679
Re: [VOTE] ZooKeeper as TLP?
+1 On 22 October 2010 14:53, Mahadev Konar maha...@yahoo-inc.com wrote: +1 On 10/22/10 2:42 PM, Patrick Hunt ph...@apache.org wrote: Please vote as to whether you think ZooKeeper should become a top-level Apache project, as discussed previously on this list. I've included below a draft board resolution. Do folks support sending this request on to the Hadoop PMC? Patrick X. Establish the Apache ZooKeeper Project WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software related to distributed system coordination for distribution at no charge to the public. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the Apache ZooKeeper Project, be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache ZooKeeper Project be and hereby is responsible for the creation and maintenance of software related to distributed system coordination; and be it further RESOLVED, that the office of Vice President, Apache ZooKeeper be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache ZooKeeper Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache ZooKeeper Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache ZooKeeper Project: * Patrick Hunt ph...@apache.org * Flavio Junqueira f...@apache.org * Mahadev Konarmaha...@apache.org * Benjamin Reedbr...@apache.org * Henry Robinson he...@apache.org NOW, THEREFORE, BE IT FURTHER RESOLVED, that Patrick Hunt be appointed to the office of Vice President, Apache ZooKeeper, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the initial Apache ZooKeeper PMC be and hereby is tasked with the creation of a set of bylaws intended to encourage open development and increased participation in the Apache ZooKeeper Project; and be it further RESOLVED, that the Apache ZooKeeper Project be and hereby is tasked with the migration and rationalization of the Apache Hadoop ZooKeeper sub-project; and be it further RESOLVED, that all responsibilities pertaining to the Apache Hadoop ZooKeeper sub-project encumbered upon the Apache Hadoop Project are hereafter discharged. -- Henry Robinson Software Engineer Cloudera 415-994-6679
Re: Restarting discussion on ZooKeeper as a TLP
was that by becoming a TLP the project would lose it's connection with Hadoop, a big source of new users for us. I've been assured (and you can see with the other projects that have moved to tlp status; pig/hive/hbase/etc...) that this connection will be maintained. The Hadoop ZooKeeper tab for example will redirect to our new homepage. Other Apache members also pointed out to me that we are essentially operating as a TLP within the Hadoop PMC. Most of the other PMC members have little or no experience with ZooKeeper and this makes it difficult for them to monitor and advise us. By moving to TLP status we'll be able to govern ourselves and better set our direction. I believe we are ready to become a TLP. Please respond to this email with your thoughts and any issues. I will call a vote in a few days, once discussion settles. Regards, Patrick *flavio* *junqueira* research scientist f...@yahoo-inc.com direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300fax (408) 349 3301 -- Henry Robinson Software Engineer Cloudera 415-994-6679
Re: Restarting discussion on ZooKeeper as a TLP
Ha, I may just have excluded myself from eligibility due to my inability to read :) On 21 October 2010 13:28, Patrick Hunt ph...@apache.org wrote: Ack, I missed Henry in the list, sorry! In my defense I copied this: http://hadoop.apache.org/zookeeper/credits.html one more try (same as before except for adding henry to the pmc): X. Establish the Apache ZooKeeper Project WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software related to data serialization for distribution at no charge to the public. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the Apache ZooKeeper Project, be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache ZooKeeper Project be and hereby is responsible for the creation and maintenance of software related to data serialization; and be it further RESOLVED, that the office of Vice President, Apache ZooKeeper be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache ZooKeeper Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache ZooKeeper Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache ZooKeeper Project: * Patrick Hunt ph...@apache.org * Flavio Junqueira f...@apache.org * Mahadev Konarmaha...@apache.org * Benjamin Reedbr...@apache.org * Henry Robinson he...@apache.org NOW, THEREFORE, BE IT FURTHER RESOLVED, that Matt Massie be appointed to the office of Vice President, Apache ZooKeeper, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the initial Apache ZooKeeper PMC be and hereby is tasked with the creation of a set of bylaws intended to encourage open development and increased participation in the Apache ZooKeeper Project; and be it further RESOLVED, that the Apache ZooKeeper Project be and hereby is tasked with the migration and rationalization of the Apache Hadoop ZooKeeper sub-project; and be it further RESOLVED, that all responsibilities pertaining to the Apache Hadoop ZooKeeper sub-project encumbered upon the Apache Hadoop Project are hereafter discharged. On Thu, Oct 21, 2010 at 10:44 AM, Henry Robinson he...@cloudera.com wrote: Looks good, please do call a vote. On 21 October 2010 09:29, Patrick Hunt ph...@apache.org wrote: Here's a draft board resolution (not a vote, just discussion). It lists all current committers (except as noted in the next paragraph) as the initial members of the project management committee (PMC) and myself as the initial chair. Notice that I have left Andrew off the PMC as he has not been active with the project for over two years. I believe we should continue to include him on the committer roles subsequent to moving to tlp, however as he has not been an active member of the community for such a long period we would not include him on the PMC at this time. If others feel differently let me know, I'm willing to include him if the people feel differently. LMK if this looks good to you and I'll call for an official vote on this list (then we'll be ready to call a vote on the hadoop pmc). Regards, Patrick X. Establish the Apache ZooKeeper Project WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software related to data serialization for distribution at no charge to the public. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the Apache ZooKeeper Project, be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache ZooKeeper Project be and hereby is responsible for the creation and maintenance of software related to data serialization
Re: Restarting discussion on ZooKeeper as a TLP
+1, thanks for following through with the protocol. On 20 October 2010 11:02, Vishal K vishalm...@gmail.com wrote: +1. On Wed, Oct 20, 2010 at 1:50 PM, Patrick Hunt ph...@apache.org wrote: It's been a few days, any thoughts? Acceptable? I'd like to keep moving the ball forward. Thanks. Patrick On Sun, Oct 17, 2010 at 8:43 PM, 明珠刘 redis...@gmail.com wrote: +1 2010/10/14 Patrick Hunt ph...@apache.org In March of this year we discussed a request from the Apache Board, and Hadoop PMC, that we become a TLP rather than a subproject of Hadoop: Original discussion http://markmail.org/thread/42cobkpzlgotcbin I originally voted against this move, my primary concern being that we were not ready to move to tlp status given our small contributor base and limited contributor diversity. However I'd now like to revisit that discussion/decision. Since that time the team has been working hard to attract new contributors, and we've seen significant new contributions come in. There has also been feedback from board/pmc addressing many of these concerns (both on the list and in private). I am now less concerned about this issue and don't see it as a blocker for us to move to TLP status. A second concern was that by becoming a TLP the project would lose it's connection with Hadoop, a big source of new users for us. I've been assured (and you can see with the other projects that have moved to tlp status; pig/hive/hbase/etc...) that this connection will be maintained. The Hadoop ZooKeeper tab for example will redirect to our new homepage. Other Apache members also pointed out to me that we are essentially operating as a TLP within the Hadoop PMC. Most of the other PMC members have little or no experience with ZooKeeper and this makes it difficult for them to monitor and advise us. By moving to TLP status we'll be able to govern ourselves and better set our direction. I believe we are ready to become a TLP. Please respond to this email with your thoughts and any issues. I will call a vote in a few days, once discussion settles. Regards, Patrick -- Henry Robinson Software Engineer Cloudera 415-994-6679
[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-888: - Hadoop Flags: [Reviewed] I just committed this to origin/branch-3.3 and origin/trunk. Thanks both! c-client / zkpython: Double free corruption on node watcher --- Key: ZOOKEEPER-888 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 Project: Zookeeper Issue Type: Bug Components: c client, contrib-bindings Affects Versions: 3.3.1 Reporter: Lukas Assignee: Lukas Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: resume-segfault.py, ZOOKEEPER-888-3.3.patch, ZOOKEEPER-888.patch the c-client / zkpython wrapper invokes already freed watcher callback steps to reproduce: 0. start a zookeper server on your machine 1. run the attached python script 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) 3. wait until the connection and the node observer fired with a session event 4. resume the zookeeper server process (e.g. using `pkill -CONT -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) - the client tries to dispatch the node observer function again, but it was already freed - double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-888: - Resolution: Fixed Status: Resolved (was: Patch Available) c-client / zkpython: Double free corruption on node watcher --- Key: ZOOKEEPER-888 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 Project: Zookeeper Issue Type: Bug Components: c client, contrib-bindings Affects Versions: 3.3.1 Reporter: Lukas Assignee: Lukas Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: resume-segfault.py, ZOOKEEPER-888-3.3.patch, ZOOKEEPER-888.patch the c-client / zkpython wrapper invokes already freed watcher callback steps to reproduce: 0. start a zookeper server on your machine 1. run the attached python script 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) 3. wait until the connection and the node observer fired with a session event 4. resume the zookeeper server process (e.g. using `pkill -CONT -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) - the client tries to dispatch the node observer function again, but it was already freed - double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922209#action_12922209 ] Henry Robinson commented on ZOOKEEPER-888: -- The patch as it stands relies on ZOOKEEPER-853 (which it fixes) which is not in 3.3 as it is a small API change - it changes is_unrecoverable to return Python True or False, rather than ZINVALIDSTATE. So I'm not certain about what to do here - we try not to change APIs between minor versions. However, this is a very minor change, and this patch fixes a significant bug. I'm inclined to commit both 853 and this patch to 3.3 as well as trunk, and put a note in the release notes. Any objections? c-client / zkpython: Double free corruption on node watcher --- Key: ZOOKEEPER-888 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 Project: Zookeeper Issue Type: Bug Components: c client, contrib-bindings Affects Versions: 3.3.1 Reporter: Lukas Assignee: Lukas Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: resume-segfault.py, ZOOKEEPER-888.patch the c-client / zkpython wrapper invokes already freed watcher callback steps to reproduce: 0. start a zookeper server on your machine 1. run the attached python script 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) 3. wait until the connection and the node observer fired with a session event 4. resume the zookeeper server process (e.g. using `pkill -CONT -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) - the client tries to dispatch the node observer function again, but it was already freed - double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Running a single unit test
You need to use -Dtestcase, not -Dtest, as per below: ant test -Dtestcase=YourTestHere HTH, Henry On 17 October 2010 17:34, Michi Mutsuzaki mic...@yahoo-inc.com wrote: Hello, How do I run a single unit test? I tried this: $ ant test -Dtest=SessionTest but it still runs all the tests. Thanks! --Michi -- Henry Robinson Software Engineer Cloudera 415-994-6679
Re: What's the QA strategy of ZooKeeper?
I broadly agree with Ben - all meaningful code changes carry a risk of destabilization (otherwise software development would be very easy) so we should guard against improving cleanliness only for its own sake. At the point where bad code gets in the way of fixing bugs or adding features, I think it's very worthwhile to 'lazily' clean code. I did this with the observers patch - reworked some of the class hierarchies to improve encapsulation and make it easier to add new implementations. The netty patch is a good test case for this approach. If we feel that reworking the structure of the existing server cnxn code will make it significantly easier to add a second implementation that adheres to the same interface, then I say that such a refactoring is worthwhile, but even then only if it's straightforward to make the changes while convincing ourselves that the behaviour of the new implementation is consistent with the old. Thomas, do comment on the patch itself! That's the very best way to make sure your concerns get heard and addressed. cheers, Henry On 15 October 2010 11:37, Benjamin Reed br...@yahoo-inc.com wrote: i think we have a very different perspective on the quality issue: I didn't want to say it that clear, but especially the new Netty code, both on client and server side is IMHO an example of new code in very bad shape. The client code patch even changes the FindBugs configuration to exclude the new code from the FindBugs checks. great. fixing the code and refactoring before a patch goes in is the perfect time to do it! please give feedback and help make the patch better. there is a reason to exclude checks (which is why there is such excludes), but if we can avoid them we should. before a patch is applied is exactly the time to do cleanup If your code is already in such a bad shape, that every change includes considerable risk to break something, then you already are in trouble. With every new feature (or bugfix!) you also risk to break something. If you don't have the attitude of permanent refactoring to improve the code quality, you will inevitably lower the maintainability of your code with every new feature. New features will build on the dirty concepts already in the code and therfor make it more expensive to ever clean things up. cleaning up code to add a new feature is a great time to clean up the code. Yes. Refactoring isn't easy, but necessary. Only over time you better understand your domain and find better structures. Over time you introduce features that let code grow so that it should better be split up in smaller units that the human brain can still handle. it is the but necessary that i disagree with. there is plenty of code that could be cleaned up and made to look a lot nicer, but we shouldn't touch it, unless we are fixing something else or adding a new feature. it's pretty lame to explain to someone that the bug that was introduced by a code change was motivated by a desire to make the code cleaner. any code change runs the risk of breakage, thus changing code simply for cleanliness is not worth the risk. ben -- Henry Robinson Software Engineer Cloudera 415-994-6679
Re: Restarting discussion on ZooKeeper as a TLP
+1, I agree that we've addressed most outstanding concerns, we're ready for TLP. Henry On 14 October 2010 13:29, Mahadev Konar maha...@yahoo-inc.com wrote: +1 for moving to TLP. Thanks for starting the vote Pat. mahadev On 10/13/10 2:10 PM, Patrick Hunt ph...@apache.org wrote: In March of this year we discussed a request from the Apache Board, and Hadoop PMC, that we become a TLP rather than a subproject of Hadoop: Original discussion http://markmail.org/thread/42cobkpzlgotcbin I originally voted against this move, my primary concern being that we were not ready to move to tlp status given our small contributor base and limited contributor diversity. However I'd now like to revisit that discussion/decision. Since that time the team has been working hard to attract new contributors, and we've seen significant new contributions come in. There has also been feedback from board/pmc addressing many of these concerns (both on the list and in private). I am now less concerned about this issue and don't see it as a blocker for us to move to TLP status. A second concern was that by becoming a TLP the project would lose it's connection with Hadoop, a big source of new users for us. I've been assured (and you can see with the other projects that have moved to tlp status; pig/hive/hbase/etc...) that this connection will be maintained. The Hadoop ZooKeeper tab for example will redirect to our new homepage. Other Apache members also pointed out to me that we are essentially operating as a TLP within the Hadoop PMC. Most of the other PMC members have little or no experience with ZooKeeper and this makes it difficult for them to monitor and advise us. By moving to TLP status we'll be able to govern ourselves and better set our direction. I believe we are ready to become a TLP. Please respond to this email with your thoughts and any issues. I will call a vote in a few days, once discussion settles. Regards, Patrick -- Henry Robinson Software Engineer Cloudera 415-994-6679
[jira] Commented: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921103#action_12921103 ] Henry Robinson commented on ZOOKEEPER-893: -- Thanks for the patch Thijs! It looks pretty good to me - good catch. Do you think you might be able to write a test case that verifies correct behaviour when you send malformed messages to the control port? ZooKeeper high cpu usage when invalid requests -- Key: ZOOKEEPER-893 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Environment: Linux 2.6.16 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz java version 1.6.0_17 Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) Reporter: Thijs Terlouw Assignee: Thijs Terlouw Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-893.patch Original Estimate: 1h Remaining Estimate: 1h When ZooKeeper receives certain illegally formed messages on the internal communication port (:4181 by default), it's possible for ZooKeeper to enter an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, but that patch does not resolve all issues. from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java the two affected parts: === int length = msgLength.getInt(); if(length = 0) { throw new IOException(Invalid packet length: + length); } === === while (message.hasRemaining()) { temp_numbytes = channel.read(message); if(temp_numbytes 0) { throw new IOException(Channel eof before end); } numbytes += temp_numbytes; } === how to replicate this bug: perform an nmap portscan against your zookeeper server: nmap -sV -n your.ip.here -p4181 wait for a while untill you see some messages in the logfile and then you will see 100% cpu usage. It does not recover from this situation. With my patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-888: - The patch looks good to me - thanks! Could you add a test case that verifies the correct behaviour, if possible? (I appreciate it can be hard to fake unrecoverable session errors). We keep circling around the correct behaviour for this code block, and I'd like to capture it in a test suite. c-client / zkpython: Double free corruption on node watcher --- Key: ZOOKEEPER-888 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 Project: Zookeeper Issue Type: Bug Components: c client, contrib-bindings Affects Versions: 3.3.1 Reporter: Lukas Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: resume-segfault.py, ZOOKEEPER-888.patch the c-client / zkpython wrapper invokes already freed watcher callback steps to reproduce: 0. start a zookeper server on your machine 1. run the attached python script 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) 3. wait until the connection and the node observer fired with a session event 4. resume the zookeeper server process (e.g. using `pkill -CONT -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) - the client tries to dispatch the node observer function again, but it was already freed - double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-785) Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line
[ https://issues.apache.org/jira/browse/ZOOKEEPER-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909376#action_12909376 ] Henry Robinson commented on ZOOKEEPER-785: -- This patch looks good - a couple of comments: 1. Can you expand the comment // Not a quorum configuration so return immediately to be clear that this isn't a problem, and that the server will default to standalone mode? 2. Can you actually move the 'bit out of place' test to somewhere more sensible? :) Let's make a QuorumConfigurationTest class if we have to. Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line --- Key: ZOOKEEPER-785 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-785 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Environment: Tested in linux with a new jvm Reporter: Alex Newman Assignee: Patrick Hunt Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-785.patch, ZOOKEEPER-785.patch, ZOOKEEPER-785_2.patch, ZOOKEEPER-785_2_br33.patch The following config causes an infinite loop [zoo.cfg] tickTime=2000 dataDir=/var/zookeeper/ clientPort=2181 initLimit=10 syncLimit=5 server.0=localhost:2888:3888 Output: 2010-06-01 16:20:32,471 - INFO [main:quorumpeerm...@119] - Starting quorum peer 2010-06-01 16:20:32,489 - INFO [main:nioservercnxn$fact...@143] - binding to port 0.0.0.0/0.0.0.0:2181 2010-06-01 16:20:32,504 - INFO [main:quorump...@818] - tickTime set to 2000 2010-06-01 16:20:32,504 - INFO [main:quorump...@829] - minSessionTimeout set to -1 2010-06-01 16:20:32,505 - INFO [main:quorump...@840] - maxSessionTimeout set to -1 2010-06-01 16:20:32,505 - INFO [main:quorump...@855] - initLimit set to 10 2010-06-01 16:20:32,526 - INFO [main:files...@82] - Reading snapshot /var/zookeeper/version-2/snapshot.c 2010-06-01 16:20:32,547 - INFO [Thread-1:quorumcnxmanager$liste...@436] - My election bind port: 3888 2010-06-01 16:20:32,554 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING 2010-06-01 16:20:32,556 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id = 0, Proposed zxid = 12 2010-06-01 16:20:32,558 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 12, 1, 0, LOOKING, LOOKING, 0 2010-06-01 16:20:32,560 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception java.lang.NullPointerException at org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621) 2010-06-01 16:20:32,560 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING 2010-06-01 16:20:32,560 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id = 0, Proposed zxid = 12 2010-06-01 16:20:32,561 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 12, 2, 0, LOOKING, LOOKING, 0 2010-06-01 16:20:32,561 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception java.lang.NullPointerException at org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621) 2010-06-01 16:20:32,561 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING 2010-06-01 16:20:32,562 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id = 0, Proposed zxid = 12 2010-06-01 16:20:32,562 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 12, 3, 0, LOOKING, LOOKING, 0 2010-06-01 16:20:32,562 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception java.lang.NullPointerException Things like HBase require that the zookeeper servers be listed in the zoo.cfg. This is a bug on their part, but zookeeper shouldn't null pointer in a loop though. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-785) Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line
[ https://issues.apache.org/jira/browse/ZOOKEEPER-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-785: - Hadoop Flags: [Reviewed] Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line --- Key: ZOOKEEPER-785 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-785 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Environment: Tested in linux with a new jvm Reporter: Alex Newman Assignee: Patrick Hunt Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-785.patch, ZOOKEEPER-785.patch, ZOOKEEPER-785_2.patch, ZOOKEEPER-785_2.patch, ZOOKEEPER-785_2_br33.patch, ZOOKEEPER-785_2_br33.patch The following config causes an infinite loop [zoo.cfg] tickTime=2000 dataDir=/var/zookeeper/ clientPort=2181 initLimit=10 syncLimit=5 server.0=localhost:2888:3888 Output: 2010-06-01 16:20:32,471 - INFO [main:quorumpeerm...@119] - Starting quorum peer 2010-06-01 16:20:32,489 - INFO [main:nioservercnxn$fact...@143] - binding to port 0.0.0.0/0.0.0.0:2181 2010-06-01 16:20:32,504 - INFO [main:quorump...@818] - tickTime set to 2000 2010-06-01 16:20:32,504 - INFO [main:quorump...@829] - minSessionTimeout set to -1 2010-06-01 16:20:32,505 - INFO [main:quorump...@840] - maxSessionTimeout set to -1 2010-06-01 16:20:32,505 - INFO [main:quorump...@855] - initLimit set to 10 2010-06-01 16:20:32,526 - INFO [main:files...@82] - Reading snapshot /var/zookeeper/version-2/snapshot.c 2010-06-01 16:20:32,547 - INFO [Thread-1:quorumcnxmanager$liste...@436] - My election bind port: 3888 2010-06-01 16:20:32,554 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING 2010-06-01 16:20:32,556 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id = 0, Proposed zxid = 12 2010-06-01 16:20:32,558 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 12, 1, 0, LOOKING, LOOKING, 0 2010-06-01 16:20:32,560 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception java.lang.NullPointerException at org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621) 2010-06-01 16:20:32,560 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING 2010-06-01 16:20:32,560 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id = 0, Proposed zxid = 12 2010-06-01 16:20:32,561 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 12, 2, 0, LOOKING, LOOKING, 0 2010-06-01 16:20:32,561 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception java.lang.NullPointerException at org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621) 2010-06-01 16:20:32,561 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING 2010-06-01 16:20:32,562 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id = 0, Proposed zxid = 12 2010-06-01 16:20:32,562 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 12, 3, 0, LOOKING, LOOKING, 0 2010-06-01 16:20:32,562 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception java.lang.NullPointerException Things like HBase require that the zookeeper servers be listed in the zoo.cfg. This is a bug on their part, but zookeeper shouldn't null pointer in a loop though. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-785) Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line
[ https://issues.apache.org/jira/browse/ZOOKEEPER-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-785: - +1, this looks good (although I'd remove the 'out of place in this class' comment now that you've moved it). Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line --- Key: ZOOKEEPER-785 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-785 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Environment: Tested in linux with a new jvm Reporter: Alex Newman Assignee: Patrick Hunt Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-785.patch, ZOOKEEPER-785.patch, ZOOKEEPER-785_2.patch, ZOOKEEPER-785_2.patch, ZOOKEEPER-785_2_br33.patch, ZOOKEEPER-785_2_br33.patch The following config causes an infinite loop [zoo.cfg] tickTime=2000 dataDir=/var/zookeeper/ clientPort=2181 initLimit=10 syncLimit=5 server.0=localhost:2888:3888 Output: 2010-06-01 16:20:32,471 - INFO [main:quorumpeerm...@119] - Starting quorum peer 2010-06-01 16:20:32,489 - INFO [main:nioservercnxn$fact...@143] - binding to port 0.0.0.0/0.0.0.0:2181 2010-06-01 16:20:32,504 - INFO [main:quorump...@818] - tickTime set to 2000 2010-06-01 16:20:32,504 - INFO [main:quorump...@829] - minSessionTimeout set to -1 2010-06-01 16:20:32,505 - INFO [main:quorump...@840] - maxSessionTimeout set to -1 2010-06-01 16:20:32,505 - INFO [main:quorump...@855] - initLimit set to 10 2010-06-01 16:20:32,526 - INFO [main:files...@82] - Reading snapshot /var/zookeeper/version-2/snapshot.c 2010-06-01 16:20:32,547 - INFO [Thread-1:quorumcnxmanager$liste...@436] - My election bind port: 3888 2010-06-01 16:20:32,554 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING 2010-06-01 16:20:32,556 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id = 0, Proposed zxid = 12 2010-06-01 16:20:32,558 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 12, 1, 0, LOOKING, LOOKING, 0 2010-06-01 16:20:32,560 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception java.lang.NullPointerException at org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621) 2010-06-01 16:20:32,560 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING 2010-06-01 16:20:32,560 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id = 0, Proposed zxid = 12 2010-06-01 16:20:32,561 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 12, 2, 0, LOOKING, LOOKING, 0 2010-06-01 16:20:32,561 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception java.lang.NullPointerException at org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621) 2010-06-01 16:20:32,561 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING 2010-06-01 16:20:32,562 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id = 0, Proposed zxid = 12 2010-06-01 16:20:32,562 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 12, 3, 0, LOOKING, LOOKING, 0 2010-06-01 16:20:32,562 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception java.lang.NullPointerException Things like HBase require that the zookeeper servers be listed in the zoo.cfg. This is a bug on their part, but zookeeper shouldn't null pointer in a loop though. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Zoosh!
Hi Michi - This sounds cool - but your link goes to what I think is a Yahoo-internal site, and I suspect that 'yinst' is a Yahoo-specific tool. Perhaps you either did not mean to send this mail to this list, or you are not aware that this is a public mailing list, open to all? Either way, thanks for your interest in ZooKeeper, and if what you have written would be of interest to a general audience, please do consider contributing it back! cheers, Henry On 31 August 2010 17:40, Michi Mutsuzaki mic...@yahoo-inc.com wrote: I created a wrapper package for Java zookeeper shell. Unlike C version, it supports command history and tab completion. $ yinst install zoosh -br test $ zoosh localhost:2181 http://dist.corp.yahoo.com/by-package/zoosh/ --Michi -- Henry Robinson Software Engineer Cloudera 415-994-6679
[jira] Updated: (ZOOKEEPER-853) Make zookeeper.is_unrecoverable return True or False and not an integer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-853: - Status: Resolved (was: Patch Available) Resolution: Fixed I just committed this (to trunk) - thanks Andrei! Make zookeeper.is_unrecoverable return True or False and not an integer --- Key: ZOOKEEPER-853 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-853 Project: Zookeeper Issue Type: Improvement Components: contrib-bindings Reporter: Andrei Savu Assignee: Andrei Savu Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-853.patch, ZOOKEEPER-853.patch This is a patch that fixes a TODO from the python zookeeper extension, it makes {{zookeeper.is_unrecoverable}} return {{True}} or {{False}} and not an integer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Putting copyright notices in ZK?
Hi Vishal - I'm afraid we don't allow author or copyright information in source files. Putting one's own copyright notice is against Apache policy (and we are guided by the rules of the ASF). The SVN logs will keep track of ownership details, but it's not at all clear what copyright notices even mean once you have granted license to the ASF by virtue of submitting your patch. To avoid any confusion, we just disallow author specific information in the source. I hope you can find some compromise with your legal department - I'm pretty sure I know of other contributions from VMWare employees to open source projects that don't have this restriction, so I'm hopeful that you can resolve this issue. Best, Henry On 26 August 2010 14:58, Vishal K vishalm...@gmail.com wrote: Hi All, I work for VMware. My company tells me that any contirubtion that I make to ZK needs to have a line saying Copyright [year of creation - year of last modification] VMware, Inc. All Rights Reserved. If portions of a file are modified, then I could identify only those portions of the file, if needed. No change to license is required. Needless to say, I am personally ok to make contirbutions without any such notices. What is ZK's policy on this? What would be a good solution in this case satisfyigng both the parties (ZK and my company's legal dept.)? Thanks. -Vishal -- Henry Robinson Software Engineer Cloudera 415-994-6679
[jira] Updated: (ZOOKEEPER-853) Make zookeeper.is_unrecoverable return True or False and not an integer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-853: - Hadoop Flags: [Reviewed] +1 This looks good to me - thanks. Make zookeeper.is_unrecoverable return True or False and not an integer --- Key: ZOOKEEPER-853 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-853 Project: Zookeeper Issue Type: Improvement Components: contrib-bindings Reporter: Andrei Savu Assignee: Andrei Savu Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-853.patch, ZOOKEEPER-853.patch This is a patch that fixes a TODO from the python zookeeper extension, it makes {{zookeeper.is_unrecoverable}} return {{True}} or {{False}} and not an integer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-792) zkpython memory leak
[ https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-792: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed I just committed this! Thanks Lei Zhang! zkpython memory leak Key: ZOOKEEPER-792 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.1 Environment: vmware workstation - guest OS:Linux python:2.4.3 Reporter: Lei Zhang Assignee: Lei Zhang Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-792.patch, ZOOKEEPER-792.patch, ZOOKEEPER-792.patch We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less client deadlock on session expiration, which is a definite plus! Unfortunately we are seeing memory leak that requires our zk clients to be restarted every half-day. Valgrind result: ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in loss record 255 of 670 ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418) ==8804==by 0x5047B42: parse_acls (zookeeper.c:369) ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009) ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0) ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0) ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-792) zkpython memory leak
[ https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900380#action_12900380 ] Henry Robinson commented on ZOOKEEPER-792: -- Aha - I think I have found the problem, and it was related to this patch. PyObject *ret = Py_BuildValue( (s#,N), buffer,buffer_len, stat_dict ); + free_pywatcher(pw); free(buffer); We shouldn't free the pywatcher_t object here because it may be called later. This was what was causing the segfault I was seeing. I'll upload a new patch with this line removed; I hope it will still fix your memory consumption issues. zkpython memory leak Key: ZOOKEEPER-792 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.1 Environment: vmware workstation - guest OS:Linux python:2.4.3 Reporter: Lei Zhang Assignee: Lei Zhang Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-792.patch We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less client deadlock on session expiration, which is a definite plus! Unfortunately we are seeing memory leak that requires our zk clients to be restarted every half-day. Valgrind result: ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in loss record 255 of 670 ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418) ==8804==by 0x5047B42: parse_acls (zookeeper.c:369) ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009) ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0) ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0) ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-792) zkpython memory leak
[ https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-792: - Attachment: ZOOKEEPER-792.patch I forgot --no-prefix. Plus ca change, plus c'est la meme chose. zkpython memory leak Key: ZOOKEEPER-792 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.1 Environment: vmware workstation - guest OS:Linux python:2.4.3 Reporter: Lei Zhang Assignee: Lei Zhang Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-792.patch, ZOOKEEPER-792.patch, ZOOKEEPER-792.patch We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less client deadlock on session expiration, which is a definite plus! Unfortunately we are seeing memory leak that requires our zk clients to be restarted every half-day. Valgrind result: ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in loss record 255 of 670 ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418) ==8804==by 0x5047B42: parse_acls (zookeeper.c:369) ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009) ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0) ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0) ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-792) zkpython memory leak
[ https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899676#action_12899676 ] Henry Robinson commented on ZOOKEEPER-792: -- Just to update - I've found that zkpython tests are failing in trunk, and I don't want to commit a patch when the tests are broken. I'll be creating a JIRA shortly to address the problem once I've looked into it slightly further. zkpython memory leak Key: ZOOKEEPER-792 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.1 Environment: vmware workstation - guest OS:Linux python:2.4.3 Reporter: Lei Zhang Assignee: Lei Zhang Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-792.patch We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less client deadlock on session expiration, which is a definite plus! Unfortunately we are seeing memory leak that requires our zk clients to be restarted every half-day. Valgrind result: ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in loss record 255 of 670 ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418) ==8804==by 0x5047B42: parse_acls (zookeeper.c:369) ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009) ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0) ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0) ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-792) zkpython memory leak
[ https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899004#action_12899004 ] Henry Robinson commented on ZOOKEEPER-792: -- Hi - Sorry for the slow response! I just took a look over the patch - good catches. +1. I'll commit within the day. Henry zkpython memory leak Key: ZOOKEEPER-792 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.1 Environment: vmware workstation - guest OS:Linux python:2.4.3 Reporter: Lei Zhang Assignee: Lei Zhang Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-792.patch We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less client deadlock on session expiration, which is a definite plus! Unfortunately we are seeing memory leak that requires our zk clients to be restarted every half-day. Valgrind result: ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in loss record 255 of 670 ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418) ==8804==by 0x5047B42: parse_acls (zookeeper.c:369) ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009) ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0) ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0) ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-784) server-side functionality for read-only mode
[ https://issues.apache.org/jira/browse/ZOOKEEPER-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897338#action_12897338 ] Henry Robinson commented on ZOOKEEPER-784: -- Spectacular job, Sergey. I've taken a look at the code and I'm pretty satisfied - you've done a great job covering little things like JMX support, and good code comments and documentation. I'm going to wait for one of the other committers to come by and also give this a +1 since this is a substantial change. We may also decide to run a long-lived test with this patch to satisfy ourselves of the stability. But this looks very, very solid indeed. server-side functionality for read-only mode Key: ZOOKEEPER-784 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-784 Project: Zookeeper Issue Type: Sub-task Components: server Reporter: Sergey Doroshenko Assignee: Sergey Doroshenko Fix For: 3.4.0 Attachments: ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch As per http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode , create ReadOnlyZooKeeperServer which comes into play when peer is partitioned. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython
[ https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889940#action_12889940 ] Henry Robinson commented on ZOOKEEPER-821: -- Rich - This is a really useful contribution, thanks! The only thing I would change from your patch would be to use snprintf with a buffer length of 10 so as to avoid any potential string overflows if our version numbers ever get huge :) Otherwise +1; if you make this change I'll commit asap. Thanks! Henry Add ZooKeeper version information to zkpython - Key: ZOOKEEPER-821 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821 Project: Zookeeper Issue Type: Improvement Components: contrib-bindings Affects Versions: 3.3.1 Reporter: Rich Schumacher Assignee: Rich Schumacher Priority: Trivial Fix For: 3.4.0 Attachments: ZOOKEEPER-821.patch Since installing and using ZooKeeper I've built and installed no less than four versions of the zkpython bindings. It would be really helpful if the module had a '__version__' attribute to easily tell which version is currently in use. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-784) server-side functionality for read-only mode
[ https://issues.apache.org/jira/browse/ZOOKEEPER-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881746#action_12881746 ] Henry Robinson commented on ZOOKEEPER-784: -- I like the idea of fake sessions fine, although I think that the upgrade process might be complex. Another possibility is to do away with sessions in read-only mode (because they're mainly used to maintain state about watches, which don't make sense on a read-only server). Sergey - just looked over your patch. Nice job! Couple of questions: 1. In QuorumPeer.java, I can't quite follow the logic in this part of the patch: {code} while (running) { switch (getPeerState()) { case LOOKING: +LOG.info(LOOKING); +ReadOnlyZooKeeperServer roZk = null; try { -LOG.info(LOOKING); +roZk = new ReadOnlyZooKeeperServer( +logFactory, this, +new ZooKeeperServer.BasicDataTreeBuilder(), +this.zkDb); +roZk.startup(); + {code} - is it sensible to start a ROZKServer every time a server enters the 'LOOKING' state, or should there be some kind of delay before it decides it is partitioned? Otherwise when a leader is lost and the quorum is doing a re-election, r/w clients that try and connect would get (I think) 'can't be read-only' messages . 2. What are you doing about watches? It seems to me that setting a watch turns a read operation into a read / write operation, and the client should be told that watch registration failed. If you can do this you don't have to worry so much about session migration because there's very little session state maintained by a ROZKServer on behalf of the client. 3. This patch has got to the point where it might be good if you started adding some tests to validate any further development you do. server-side functionality for read-only mode Key: ZOOKEEPER-784 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-784 Project: Zookeeper Issue Type: Sub-task Reporter: Sergey Doroshenko Assignee: Sergey Doroshenko Attachments: ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch As per http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode , create ReadOnlyZooKeeperServer which comes into play when peer is partitioned. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-740) zkpython leading to segfault on zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877227#action_12877227 ] Henry Robinson commented on ZOOKEEPER-740: -- Mike - Great catch, thanks for figuring this out. I'm correct in saying that this doesn't prevent watchers from eventually being correctly freed, right? If so, then it would be great if you could submit this patch formally so that we can get it into trunk. See http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute for details. Thanks, Henry zkpython leading to segfault on zookeeper - Key: ZOOKEEPER-740 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-740 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.0 Reporter: Federico Assignee: Henry Robinson Priority: Critical Fix For: 3.4.0 The program that we are implementing uses the python binding for zookeeper but sometimes it crash with segfault; here is the bt from gdb: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xad244b70 (LWP 28216)] 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0) at ../Objects/abstract.c:2488 2488../Objects/abstract.c: No such file or directory. in ../Objects/abstract.c (gdb) bt #0 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0) at ../Objects/abstract.c:2488 #1 0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x862fab0, arg=0x8837194, kw=0x0) at ../Python/ceval.c:3575 #2 0x080612a0 in PyObject_CallObject (o=0x862fab0, a=0x8837194) at ../Objects/abstract.c:2480 #3 0x0047af42 in watcher_dispatch (zzh=0x86174e0, type=-1, state=1, path=0x86337c8 , context=0x8588660) at src/c/zookeeper.c:314 #4 0x00496559 in do_foreach_watcher (zh=0x86174e0, type=-1, state=1, path=0x86337c8 , list=0xa5354140) at src/zk_hashtable.c:275 #5 deliverWatchers (zh=0x86174e0, type=-1, state=1, path=0x86337c8 , list=0xa5354140) at src/zk_hashtable.c:317 #6 0x0048ae3c in process_completions (zh=0x86174e0) at src/zookeeper.c:1766 #7 0x0049706b in do_completion (v=0x86174e0) at src/mt_adaptor.c:333 #8 0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #9 0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-704) GSoC 2010: Read-Only Mode
[ https://issues.apache.org/jira/browse/ZOOKEEPER-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson reassigned ZOOKEEPER-704: Assignee: Sergey Doroshenko GSoC 2010: Read-Only Mode - Key: ZOOKEEPER-704 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-704 Project: Zookeeper Issue Type: Wish Reporter: Henry Robinson Assignee: Sergey Doroshenko Read-only mode Possible Mentor Henry Robinson (henry at apache dot org) Requirements Java and TCP/IP networking Description When a ZooKeeper server loses contact with over half of the other servers in an ensemble ('loses a quorum'), it stops responding to client requests because it cannot guarantee that writes will get processed correctly. For some applications, it would be beneficial if a server still responded to read requests when the quorum is lost, but caused an error condition when a write request was attempted. This project would implement a 'read-only' mode for ZooKeeper servers (maybe only for Observers) that allowed read requests to be served as long as the client can contact a server. This is a great project for getting really hands-on with the internals of ZooKeeper - you must be comfortable with Java and networking otherwise you'll have a hard time coming up to speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-783) committedLog in ZKDatabase is not properly synchronized
committedLog in ZKDatabase is not properly synchronized --- Key: ZOOKEEPER-783 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-783 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Henry Robinson Priority: Critical ZKDatabase.getCommittedLog() returns a reference to the LinkedListProposal committedLog in ZKDatabase. This is then iterated over by at least one caller. I have seen a bug that causes a NPE in LinkedList.clear on committedLog, which I am pretty sure is due to the lack of synchronization. This bug has not been apparent in normal ZK operation, but in code that I have that starts and stops a ZK server in process repeatedly (clear() is called from ZooKeeperServerMain.shutdown()). It's better style to defensively copy the list in getCommittedLog, and to synchronize on the list in ZKDatabase.clear. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-783) committedLog in ZKDatabase is not properly synchronized
[ https://issues.apache.org/jira/browse/ZOOKEEPER-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-783: - Attachment: ZOOKEEPER-783.patch Defensive copying added to getCommittedLog() and synchronization during clear(). No tests added; really not sure how best to test for this. It does fix my test case but it's very difficult to distill that into a test (plus it only fails once in about 100 runs). committedLog in ZKDatabase is not properly synchronized --- Key: ZOOKEEPER-783 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-783 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Henry Robinson Priority: Critical Attachments: ZOOKEEPER-783.patch ZKDatabase.getCommittedLog() returns a reference to the LinkedListProposal committedLog in ZKDatabase. This is then iterated over by at least one caller. I have seen a bug that causes a NPE in LinkedList.clear on committedLog, which I am pretty sure is due to the lack of synchronization. This bug has not been apparent in normal ZK operation, but in code that I have that starts and stops a ZK server in process repeatedly (clear() is called from ZooKeeperServerMain.shutdown()). It's better style to defensively copy the list in getCommittedLog, and to synchronize on the list in ZKDatabase.clear. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members
[ https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-769: - Status: Resolved (was: Patch Available) Resolution: Fixed I just committed this - thanks Sergey! Leader can treat observers as quorum members Key: ZOOKEEPER-769 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.0 Environment: Ubuntu Karmic x64 Reporter: Sergey Doroshenko Assignee: Sergey Doroshenko Fix For: 3.4.0 Attachments: follower.log, leader.log, observer.log, warning.patch, zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch In short: it seems leader can treat observers as quorum members. Steps to repro: 1. Server configuration: 3 voters, 2 observers (attached). 2. Bring up 2 voters and one observer. It's enough for quorum. 3. Shut down the one from the quorum who is the follower. As I understand, expected result is that leader will start a new election round so that to regain quorum. But the real situation is that it just says goodbye to that follower, and is still operable. (When I'm shutting down 3rd one -- observer -- leader starts trying to regain a quorum). (Expectedly, if on step 3 we shut down the leader, not the follower, remaining follower starta new leader election, as it should be). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [PATCH] javaclient: validate sessionTimeout field at ZooKeeper init (JIRA ZOOKEEPER-776)
Hi Greg - Thanks very much for contributing! We've got some guidelines here: http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute - let me know if they're not clear. The main thing for you to do is to attach your patch to the JIRA and click the 'Licensed for inclusion into Apache projects' button when you do. You can do this by clicking 'Attach patch' on the JIRA itself. Once you've done that, please click 'Submit patch' to kick off our automated QA procedures. Assuming all goes well, a committer will pick up the baton from there and get the patch into trunk (or let you know if they think changes are necessary). Thanks! Henry On 21 May 2010 12:22, Gregory Haskins gregory.hask...@gmail.com wrote: Hi All, First patch submission for me. If there are any patch submission guidelines I should follow, kindly point me at them and accept my apology if this approach violates any established procedures. I didn't find anything obvious on the site wiki, so I just used some practices learned on other projects. -Greg commit 840f56d388582e1df39f7513aa7f4d4ce0610718 Author: Gregory Haskins ghask...@novell.com Date: Fri May 21 14:58:14 2010 -0400 javaclient: validate sessionTimeout field at ZooKeeper init JIRA ZOOKEEPER-776 describes the following problem: passing in a 0 sessionTimeout to ZooKeeper() constructor leads to errors in subsequent operations. It would be ideal to capture this configuration error at the source by throwing something like an IllegalArgument exception when the bogus sessionTimeout is specified, instead of later when it is utilized. This patch is a proposal to fix the problem referenced above. Applies to svn-id: 946074 Signed-off-by: Gregory Haskins ghask...@novell.com diff --git a/src/java/main/org/apache/zookeeper/ClientCnxn.java b/src/java/main/ index 8eb227d..682811b 100644 --- a/src/java/main/org/apache/zookeeper/ClientCnxn.java +++ b/src/java/main/org/apache/zookeeper/ClientCnxn.java @@ -353,6 +353,11 @@ public class ClientCnxn { this.sessionId = sessionId; this.sessionPasswd = sessionPasswd; + if (sessionTimeout = 0) { + throw new IOException(sessionTimeout + sessionTimeout + + is not valid); + } + // parse out chroot, if any int off = hosts.indexOf('/'); if (off = 0) { -- Henry Robinson Software Engineer Cloudera 415-994-6679
[jira] Commented: (ZOOKEEPER-776) API should sanity check sessionTimeout argument
[ https://issues.apache.org/jira/browse/ZOOKEEPER-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870152#action_12870152 ] Henry Robinson commented on ZOOKEEPER-776: -- Thanks Greg - can you generate your patch from git with --no-prefix, to make it svn compatible? API should sanity check sessionTimeout argument --- Key: ZOOKEEPER-776 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-776 Project: Zookeeper Issue Type: Improvement Components: c client, java client Affects Versions: 3.2.2, 3.3.0, 3.3.1 Environment: OSX 10.6.3, JVM 1.6.0-20 Reporter: Gregory Haskins Priority: Minor Fix For: 3.4.0 Attachments: zookeeper-776-fix.patch passing in a 0 sessionTimeout to ZooKeeper() constructor leads to errors in subsequent operations. It would be ideal to capture this configuration error at the source by throwing something like an IllegalArgument exception when the bogus sessionTimeout is specified, instead of later when it is utilized. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-776) API should sanity check sessionTimeout argument
[ https://issues.apache.org/jira/browse/ZOOKEEPER-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870164#action_12870164 ] Henry Robinson commented on ZOOKEEPER-776: -- Cancelling the patch is fine but there's no need to delete it - Hudson will always figure out what the latest patch is and it's good to see how a ticket evolved. Tests will also help :) API should sanity check sessionTimeout argument --- Key: ZOOKEEPER-776 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-776 Project: Zookeeper Issue Type: Improvement Components: c client, java client Affects Versions: 3.2.2, 3.3.0, 3.3.1 Environment: OSX 10.6.3, JVM 1.6.0-20 Reporter: Gregory Haskins Priority: Minor Fix For: 3.4.0 Attachments: zookeeper-776-fix.patch passing in a 0 sessionTimeout to ZooKeeper() constructor leads to errors in subsequent operations. It would be ideal to capture this configuration error at the source by throwing something like an IllegalArgument exception when the bogus sessionTimeout is specified, instead of later when it is utilized. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-776) API should sanity check sessionTimeout argument
[ https://issues.apache.org/jira/browse/ZOOKEEPER-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870179#action_12870179 ] Henry Robinson commented on ZOOKEEPER-776: -- Greg - Don't worry - you should have seen the hash I made of my first patch! Hudson is misbehaving at the moment, so I'm not convinced that the test failures are as a result of your patch. You don't need to do anything right now - I'll take a look and update this ticket once I know what's going on. cheers, Henry API should sanity check sessionTimeout argument --- Key: ZOOKEEPER-776 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-776 Project: Zookeeper Issue Type: Improvement Components: c client, java client Affects Versions: 3.2.2, 3.3.0, 3.3.1 Environment: OSX 10.6.3, JVM 1.6.0-20 Reporter: Gregory Haskins Priority: Minor Fix For: 3.4.0 Attachments: zookeeper-776-fix.patch passing in a 0 sessionTimeout to ZooKeeper() constructor leads to errors in subsequent operations. It would be ideal to capture this configuration error at the source by throwing something like an IllegalArgument exception when the bogus sessionTimeout is specified, instead of later when it is utilized. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members
[ https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-769: - Status: Open (was: Patch Available) Leader can treat observers as quorum members Key: ZOOKEEPER-769 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.0 Environment: Ubuntu Karmic x64 Reporter: Sergey Doroshenko Assignee: Sergey Doroshenko Fix For: 3.4.0 Attachments: follower.log, leader.log, observer.log, warning.patch, zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch In short: it seems leader can treat observers as quorum members. Steps to repro: 1. Server configuration: 3 voters, 2 observers (attached). 2. Bring up 2 voters and one observer. It's enough for quorum. 3. Shut down the one from the quorum who is the follower. As I understand, expected result is that leader will start a new election round so that to regain quorum. But the real situation is that it just says goodbye to that follower, and is still operable. (When I'm shutting down 3rd one -- observer -- leader starts trying to regain a quorum). (Expectedly, if on step 3 we shut down the leader, not the follower, remaining follower starta new leader election, as it should be). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members
[ https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-769: - Status: Patch Available (was: Open) Hadoop Flags: [Reviewed] hudson? hello? Leader can treat observers as quorum members Key: ZOOKEEPER-769 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.0 Environment: Ubuntu Karmic x64 Reporter: Sergey Doroshenko Assignee: Sergey Doroshenko Fix For: 3.4.0 Attachments: follower.log, leader.log, observer.log, warning.patch, zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch In short: it seems leader can treat observers as quorum members. Steps to repro: 1. Server configuration: 3 voters, 2 observers (attached). 2. Bring up 2 voters and one observer. It's enough for quorum. 3. Shut down the one from the quorum who is the follower. As I understand, expected result is that leader will start a new election round so that to regain quorum. But the real situation is that it just says goodbye to that follower, and is still operable. (When I'm shutting down 3rd one -- observer -- leader starts trying to regain a quorum). (Expectedly, if on step 3 we shut down the leader, not the follower, remaining follower starta new leader election, as it should be). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members
[ https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869822#action_12869822 ] Henry Robinson commented on ZOOKEEPER-769: -- Failures do not look related to this patch (although I could be mistaken). ZkDatabaseCorruptionTest is the most recent broken test - passes fine for me locally? Leader can treat observers as quorum members Key: ZOOKEEPER-769 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.0 Environment: Ubuntu Karmic x64 Reporter: Sergey Doroshenko Assignee: Sergey Doroshenko Fix For: 3.4.0 Attachments: follower.log, leader.log, observer.log, warning.patch, zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch In short: it seems leader can treat observers as quorum members. Steps to repro: 1. Server configuration: 3 voters, 2 observers (attached). 2. Bring up 2 voters and one observer. It's enough for quorum. 3. Shut down the one from the quorum who is the follower. As I understand, expected result is that leader will start a new election round so that to regain quorum. But the real situation is that it just says goodbye to that follower, and is still operable. (When I'm shutting down 3rd one -- observer -- leader starts trying to regain a quorum). (Expectedly, if on step 3 we shut down the leader, not the follower, remaining follower starta new leader election, as it should be). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members
[ https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868780#action_12868780 ] Henry Robinson commented on ZOOKEEPER-769: -- Sergey - sorry for the delay. It's on me to review this patch, and then I'll commit it. Thanks for your patience! Henry Leader can treat observers as quorum members Key: ZOOKEEPER-769 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.0 Environment: Ubuntu Karmic x64 Reporter: Sergey Doroshenko Assignee: Sergey Doroshenko Fix For: 3.4.0 Attachments: follower.log, leader.log, observer.log, warning.patch, zoo1.cfg, ZOOKEEPER-769.patch In short: it seems leader can treat observers as quorum members. Steps to repro: 1. Server configuration: 3 voters, 2 observers (attached). 2. Bring up 2 voters and one observer. It's enough for quorum. 3. Shut down the one from the quorum who is the follower. As I understand, expected result is that leader will start a new election round so that to regain quorum. But the real situation is that it just says goodbye to that follower, and is still operable. (When I'm shutting down 3rd one -- observer -- leader starts trying to regain a quorum). (Expectedly, if on step 3 we shut down the leader, not the follower, remaining follower starta new leader election, as it should be). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members
[ https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-769: - Attachment: ZOOKEEPER-769.patch I made a few small changes to your patch to make the logic a little easier to follow. Take a look and let me know if you think this is ok, otherwise I'll commit the patch tomorrow. Thanks! Henry Leader can treat observers as quorum members Key: ZOOKEEPER-769 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.0 Environment: Ubuntu Karmic x64 Reporter: Sergey Doroshenko Assignee: Sergey Doroshenko Fix For: 3.4.0 Attachments: follower.log, leader.log, observer.log, warning.patch, zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch In short: it seems leader can treat observers as quorum members. Steps to repro: 1. Server configuration: 3 voters, 2 observers (attached). 2. Bring up 2 voters and one observer. It's enough for quorum. 3. Shut down the one from the quorum who is the follower. As I understand, expected result is that leader will start a new election round so that to regain quorum. But the real situation is that it just says goodbye to that follower, and is still operable. (When I'm shutting down 3rd one -- observer -- leader starts trying to regain a quorum). (Expectedly, if on step 3 we shut down the leader, not the follower, remaining follower starta new leader election, as it should be). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members
[ https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-769: - Status: Open (was: Patch Available) Leader can treat observers as quorum members Key: ZOOKEEPER-769 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.0 Environment: Ubuntu Karmic x64 Reporter: Sergey Doroshenko Assignee: Sergey Doroshenko Fix For: 3.4.0 Attachments: follower.log, leader.log, observer.log, warning.patch, zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch In short: it seems leader can treat observers as quorum members. Steps to repro: 1. Server configuration: 3 voters, 2 observers (attached). 2. Bring up 2 voters and one observer. It's enough for quorum. 3. Shut down the one from the quorum who is the follower. As I understand, expected result is that leader will start a new election round so that to regain quorum. But the real situation is that it just says goodbye to that follower, and is still operable. (When I'm shutting down 3rd one -- observer -- leader starts trying to regain a quorum). (Expectedly, if on step 3 we shut down the leader, not the follower, remaining follower starta new leader election, as it should be). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members
[ https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-769: - Status: Patch Available (was: Open) Leader can treat observers as quorum members Key: ZOOKEEPER-769 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.0 Environment: Ubuntu Karmic x64 Reporter: Sergey Doroshenko Assignee: Sergey Doroshenko Fix For: 3.4.0 Attachments: follower.log, leader.log, observer.log, warning.patch, zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch In short: it seems leader can treat observers as quorum members. Steps to repro: 1. Server configuration: 3 voters, 2 observers (attached). 2. Bring up 2 voters and one observer. It's enough for quorum. 3. Shut down the one from the quorum who is the follower. As I understand, expected result is that leader will start a new election round so that to regain quorum. But the real situation is that it just says goodbye to that follower, and is still operable. (When I'm shutting down 3rd one -- observer -- leader starts trying to regain a quorum). (Expectedly, if on step 3 we shut down the leader, not the follower, remaining follower starta new leader election, as it should be). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson reassigned ZOOKEEPER-772: Assignee: Henry Robinson zkpython segfaults when watcher from async get children is invoked. --- Key: ZOOKEEPER-772 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Environment: ubuntu lucid (10.04) / zk trunk Reporter: Kapil Thangavelu Assignee: Henry Robinson Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff When utilizing the zkpython async get children api with a watch, i consistently get segfaults when the watcher is invoked to process events. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-772: - Attachment: ZOOKEEPER-772.patch Bug was simple when I got round to looking - was incorrectly reusing a watcher that was getting deallocated before getting called. zkpython segfaults when watcher from async get children is invoked. --- Key: ZOOKEEPER-772 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Environment: ubuntu lucid (10.04) / zk trunk Reporter: Kapil Thangavelu Assignee: Henry Robinson Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, ZOOKEEPER-772.patch When utilizing the zkpython async get children api with a watch, i consistently get segfaults when the watcher is invoked to process events. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-772: - Status: Patch Available (was: Open) zkpython segfaults when watcher from async get children is invoked. --- Key: ZOOKEEPER-772 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Environment: ubuntu lucid (10.04) / zk trunk Reporter: Kapil Thangavelu Assignee: Henry Robinson Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, ZOOKEEPER-772.patch When utilizing the zkpython async get children api with a watch, i consistently get segfaults when the watcher is invoked to process events. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-772: - Status: Open (was: Patch Available) zkpython segfaults when watcher from async get children is invoked. --- Key: ZOOKEEPER-772 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Environment: ubuntu lucid (10.04) / zk trunk Reporter: Kapil Thangavelu Assignee: Henry Robinson Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, ZOOKEEPER-772.patch, ZOOKEEPER-772.patch When utilizing the zkpython async get children api with a watch, i consistently get segfaults when the watcher is invoked to process events. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-772: - Status: Patch Available (was: Open) zkpython segfaults when watcher from async get children is invoked. --- Key: ZOOKEEPER-772 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Environment: ubuntu lucid (10.04) / zk trunk Reporter: Kapil Thangavelu Assignee: Henry Robinson Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, ZOOKEEPER-772.patch, ZOOKEEPER-772.patch When utilizing the zkpython async get children api with a watch, i consistently get segfaults when the watcher is invoked to process events. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-772: - Attachment: ZOOKEEPER-772.patch --no-prefix, predictably. zkpython segfaults when watcher from async get children is invoked. --- Key: ZOOKEEPER-772 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Environment: ubuntu lucid (10.04) / zk trunk Reporter: Kapil Thangavelu Assignee: Henry Robinson Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, ZOOKEEPER-772.patch, ZOOKEEPER-772.patch When utilizing the zkpython async get children api with a watch, i consistently get segfaults when the watcher is invoked to process events. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [VOTE] Release ZooKeeper 3.3.1 (candidate 0)
+1, Java tests pass for me, as do Python ones. Henry On 11 May 2010 22:32, Patrick Hunt ph...@apache.org wrote: +1, tests pass for me, also verified that nc/zktop worked properly on a real cluster (4letter word fix). Patrick On 05/07/2010 11:25 AM, Patrick Hunt wrote: I've created a candidate build for ZooKeeper 3.3.1. This is a bug fix release addressing seventeen issues (one critical) -- see the release notes for details. *** Please download, test and VOTE before the *** vote closes 11am pacific time, Wednesday, May 12.*** http://people.apache.org/~phunt/zookeeper-3.3.1-candidate-0/ Should we release this? Patrick -- Henry Robinson Software Engineer Cloudera 415-994-6679
[jira] Commented: (ZOOKEEPER-679) Offers a node design for interacting with the Java Zookeeper client.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12865639#action_12865639 ] Henry Robinson commented on ZOOKEEPER-679: -- Hi Aaron - The great thing about open source, and the relatively permissive Apache license in particular, is that Chris is free to copy any and all of ZK into github and continue with a development process that he finds more agreeable. It is completely kosher to do this. As Chris says, you are welcome to contribute, fork or ignore it. As far as I am concerned, contrib is an excellent place to put projects that directly add more functionality to their parent project (the language bindings and this patch are good examples), but not a great place to store standalone projects that simply leverage the parent (an example might be a DNS server, written in ZooKeeper). This is a needfully vague distinction, and others will have different opinions. I do not know specifically to what Chris is referring when he talks about an 'onerous' patch process, but I speculate he might mean that the role of 'committer' - someone who is gating the submission of patches - makes it harder to get your patches available for others to use quickly. Of course there are also benefits of this approach, such as a ready collection of experienced users on hand to offer advice and the relatively high standard for patches to be accepted to trunk arguably improves code quality. What's great is the two development styles are not mutually exclusive, and can, ideally, benefit from each other. If you are having difficulties with, or are frustrated by, the patch submission process here, ask for help. The community here is very happy to help, and we'll do what we can to address pain points. As for this patch, I'm happy it's going into contrib - users sometimes find ZooKeeper difficult to program to, and examples and new abstractions are always welcome. Keeping this patch in the main repository means that newcomers to ZooKeeper will find it more easily. Thanks for the contribution! Henry Offers a node design for interacting with the Java Zookeeper client. Key: ZOOKEEPER-679 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-679 Project: Zookeeper Issue Type: New Feature Components: contrib, java client, tests Reporter: Aaron Crow Assignee: Aaron Crow Fix For: 3.4.0 Attachments: ZOOKEEPER-679.patch, ZOOKEEPER-679.patch, ZOOKEEPER-679.patch, ZOOKEEPER-679.patch Following up on my conversations with Patrick and Mahadev (http://n2.nabble.com/Might-I-contribute-a-Node-design-for-the-Java-API-td4567695.html#a4567695). This patch includes the implementation as well as unit tests. The first unit test gives a simple high level demo of using the node API. The current implementation is simple and is only what I need withe current project I am working on. However, I am very open to any and all suggestions for improvement. This is a proposal to support a simplified node (or File) like API into a Zookeeper tree, by wrapping the Zookeeper Java client. It is similar to Java's File API design. Although, I'm trying to make it easier in a few spots. For example, deleting a Node recursively is done by default. I also lean toward resolving Exceptions under the hood when it seems appropriate. For example, if you ask a Node if it exists, and its parent doesn't even exist, you just get a false back (rather than a nasty Exception). As for watches and ephemeral nodes, my current work does not need these things so I currently have no handling of them. But if potential users of the Node a.k.a. File design want these things, I'd be open to supporting them as reasonable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members
[ https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12865240#action_12865240 ] Henry Robinson commented on ZOOKEEPER-769: -- Sergey - Great, thanks for making this patch! ISTR there was some reason why we didn't infer peerType from the servers list, but I can't remember what it was... As for your patch, a few small comments: 1. Use --no-prefix and just attach the output of git-diff (no mail headers etc) - Hudson is rather picky about the patch formats it can apply 2. It would be great to include a test that reads a configuration and checks that the behaviour is correct 3. If the peerTypes don't match up, should we default to the server list (on the assumption that that will be consistent across all servers)? 4. Once you've added the patch, click 'submit patch' to start Hudson moving. cheers, Henry Leader can treat observers as quorum members Key: ZOOKEEPER-769 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.0 Environment: Ubuntu Karmic x64 Reporter: Sergey Doroshenko Assignee: Sergey Doroshenko Fix For: 3.4.0 Attachments: follower.log, leader.log, observer.log, warning.patch, zoo1.cfg In short: it seems leader can treat observers as quorum members. Steps to repro: 1. Server configuration: 3 voters, 2 observers (attached). 2. Bring up 2 voters and one observer. It's enough for quorum. 3. Shut down the one from the quorum who is the follower. As I understand, expected result is that leader will start a new election round so that to regain quorum. But the real situation is that it just says goodbye to that follower, and is still operable. (When I'm shutting down 3rd one -- observer -- leader starts trying to regain a quorum). (Expectedly, if on step 3 we shut down the leader, not the follower, remaining follower starta new leader election, as it should be). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Question on quorum behavior
Sergey - Sounds like a bug. Can you open a new JIRA and attach your log files to it? Thanks, Henry On 6 May 2010 07:50, Sergey Doroshenko dors...@gmail.com wrote: In short: it seems leader can treat observers as quorum members. Steps to repro: 1. I have a following ensemble configuration: # servers list server.1=localhost:2881:3881 server.2=localhost:2882:3882 server.3=localhost:2883:3883:observer server.4=localhost:2884:3884 server.5=localhost:2885:3885:observer 2. I'm bringing up servers 1,2,3 and it's enough for quorum (1 and 2). 3. I'm shutting down the one from the quorum who is the follower. As I understand, expected result is that leader will start a new election round so that to regain quorum. But the real situation is that it just says goodbye to that follower, and is still operable. (When I'm shutting down 3rd one -- observer -- leader starts trying to regain a quorum). Is this a bug, or a feature? -- Regards, Sergey -- Henry Robinson Software Engineer Cloudera 415-994-6679
[jira] Commented: (ZOOKEEPER-768) zkpython segfault on close (assertion error in io thread)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864849#action_12864849 ] Henry Robinson commented on ZOOKEEPER-768: -- Thanks Kapil - I'll take a look. From the stack trace it looks as though a pending completion callback is null and therefore something weird is going on with a completion dispatcher being freed before it is finished being used. As per usual I can't reproduce on my machine, but this is enough information to dig into it. zkpython segfault on close (assertion error in io thread) - Key: ZOOKEEPER-768 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-768 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.4.0 Environment: ubuntu lucid (10.04), zookeeper trunk (java/c/zkpython) Reporter: Kapil Thangavelu Attachments: zkpython-segfault-client-log.txt, zkpython-segfault-stack-traces.txt, zkpython-segfault.py While trying to create a test case showing slow average add_auth, i stumbled upon a test case that reliably segfaults for me, albeit with variable amount of iterations (anwhere from 0 to 20 typically). fwiw, I've got about 220 processes in my test environment (ubuntu lucid 10.04). The test case opens a connection, adds authentication to it, and closes the connection, in a loop. I'm including the sample program and the gdb stack traces from the core file. I can upload the core file if thats helpful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members
[ https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864878#action_12864878 ] Henry Robinson commented on ZOOKEEPER-769: -- Hi Sergey - Can you attach the logs from (at least) the leader node to this ticket? I'd like to figure this one out asap. cheers, Henry Leader can treat observers as quorum members Key: ZOOKEEPER-769 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.0 Environment: Ubuntu Karmic x64 Reporter: Sergey Doroshenko Fix For: 3.3.0 Attachments: zoo1.cfg In short: it seems leader can treat observers as quorum members. Steps to repro: 1. Server configuration: 3 voters, 2 observers (attached). 2. Bring up 2 voters and one observer. It's enough for quorum. 3. Shut down the one from the quorum who is the follower. As I understand, expected result is that leader will start a new election round so that to regain quorum. But the real situation is that it just says goodbye to that follower, and is still operable. (When I'm shutting down 3rd one -- observer -- leader starts trying to regain a quorum). (Expectedly, if on step 3 we shut down the leader, not the follower, remaining follower starta new leader election, as it should be). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members
[ https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864953#action_12864953 ] Henry Robinson commented on ZOOKEEPER-769: -- Sergey - In the cfg files for nodes 3 and 5, did you include the following line? peerType=observer See http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperObservers.html for details. The observer log contains this line: 2010-05-06 22:46:00,876 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2183:quorump...@642] - FOLLOWING which is a big red flag because observers should never adopt the FOLLOWING state. If I don't have that line I can reproduce your issue. If I add it, the observers work as expected. Can you check your cfg files? cheers, Henry Leader can treat observers as quorum members Key: ZOOKEEPER-769 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.0 Environment: Ubuntu Karmic x64 Reporter: Sergey Doroshenko Fix For: 3.3.0 Attachments: follower.log, leader.log, observer.log, zoo1.cfg In short: it seems leader can treat observers as quorum members. Steps to repro: 1. Server configuration: 3 voters, 2 observers (attached). 2. Bring up 2 voters and one observer. It's enough for quorum. 3. Shut down the one from the quorum who is the follower. As I understand, expected result is that leader will start a new election round so that to regain quorum. But the real situation is that it just says goodbye to that follower, and is still operable. (When I'm shutting down 3rd one -- observer -- leader starts trying to regain a quorum). (Expectedly, if on step 3 we shut down the leader, not the follower, remaining follower starta new leader election, as it should be). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864429#action_12864429 ] Henry Robinson commented on ZOOKEEPER-763: -- Hi Kapil - As seems to be the norm for me this week, I'm struggling to reproduce :) It does seem like your python script explicitly waits for a completion to be called before closing a handle. Is this enough to leave an outstanding completion on the queue? Can you capture the stacktrace for the completion thread? I think it must be getting stuck in process_completions but it would be very valuable to know where - if it's stuck on the callback into zkpython then that means the deadlock is in the python bindings and not solely in C-land. cheers, Henry Deadlock on close w/ zkpython / c client Key: ZOOKEEPER-763 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763 Project: Zookeeper Issue Type: Bug Components: c client, contrib-bindings Affects Versions: 3.3.0 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk Reporter: Kapil Thangavelu Assignee: Mahadev konar Fix For: 3.4.0 Attachments: deadlock.py, stack-trace-deadlock.txt deadlocks occur if we attempt to close a handle while there are any outstanding async requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion thread are terminated and joined, however w\ith outstanding async requests, the completion thread won't be in a joinable state, and we effectively hang when the main thread does the join. afaics ideal behavior would be on close of a handle, to effectively clear out any remaining callbacks and let the completion thread terminate. i've tried adding some bookkeeping to within a python client to guard against closing while there is an outstanding async completion request, but its an imperfect solution since even after the python callback is executed there is still a window for deadlock before the completion thread finishes the callback. a simple example to reproduce the deadlock is attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Demo Code: Shared/Exclusive Lock
Sam - This is great - the more contributed code the better! Did you attach the code to your mail? The mailing lists strip out attachments. If you wouldn't mind creating a JIRA (see https://issues.apache.org/jira/browse/ZOOKEEPER), formatting your code as a patch and clicking the button that says you're happy for the ASF to use your code, that would be awesome - doing so makes it easier for us to add your code into Apache-hosted source repositories. Thanks again for your contribution - really pleased to see it. cheers, Henry On 5 May 2010 13:06, Sam Baskinger sam.baskin...@networkedinsights.comwrote: All, It was suggested that more demo code would be welcome. I've gotten the OK to release a shared/exclusive Lock.java implementation we have in our test labs at Networked Insights. If the community would find it useful, please do use it! :) All the best, and thanks for the excellent tool, *Sam Baskinger *Software Engineer Networked Insights http://www.networkedinsights.com -- Henry Robinson Software Engineer Cloudera 415-994-6679
[jira] Commented: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864488#action_12864488 ] Henry Robinson commented on ZOOKEEPER-763: -- Kapil - Thanks! Adding that sleep helped me understand what was going on. pyzoo_close has the GIL but blocks inside zookeeper_close, waiting for the completion thread to finish. However, if a completion is still inside Python, but has been pre-empted by the main thread which calls pyzoo_close, the completion can't get the GIL back to finish up executing, blocking the completions_thread for ever more. The fix is simple - relinquish the GIL during the zookeeper_close call, and then reacquire it straight after. There are even handy macros to do this: Py_BEGIN_ALLOW_THREADS ret = zookeeper_close(zhandles[zkhid]); Py_END_ALLOW_THREADS This same issue will affect any part of zkpython where a call to the C client is blocked on some work being completed in another Python thread - in practice, I think this means from callbacks. I'll audit the code to see if any other API calls are affected. Patch to fix this issue is following shortly - Kapil, I'd be very grateful if you could help us by testing it. cheers, Henry Deadlock on close w/ zkpython / c client Key: ZOOKEEPER-763 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763 Project: Zookeeper Issue Type: Bug Components: c client, contrib-bindings Affects Versions: 3.3.0 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk Reporter: Kapil Thangavelu Assignee: Mahadev konar Fix For: 3.4.0 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt deadlocks occur if we attempt to close a handle while there are any outstanding async requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion thread are terminated and joined, however w\ith outstanding async requests, the completion thread won't be in a joinable state, and we effectively hang when the main thread does the join. afaics ideal behavior would be on close of a handle, to effectively clear out any remaining callbacks and let the completion thread terminate. i've tried adding some bookkeeping to within a python client to guard against closing while there is an outstanding async completion request, but its an imperfect solution since even after the python callback is executed there is still a window for deadlock before the completion thread finishes the callback. a simple example to reproduce the deadlock is attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-763: - Assignee: Henry Robinson (was: Mahadev konar) Fix Version/s: 3.3.1 Component/s: (was: c client) Deadlock on close w/ zkpython / c client Key: ZOOKEEPER-763 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.0 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk Reporter: Kapil Thangavelu Assignee: Henry Robinson Fix For: 3.3.1, 3.4.0 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt deadlocks occur if we attempt to close a handle while there are any outstanding async requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion thread are terminated and joined, however w\ith outstanding async requests, the completion thread won't be in a joinable state, and we effectively hang when the main thread does the join. afaics ideal behavior would be on close of a handle, to effectively clear out any remaining callbacks and let the completion thread terminate. i've tried adding some bookkeeping to within a python client to guard against closing while there is an outstanding async completion request, but its an imperfect solution since even after the python callback is executed there is still a window for deadlock before the completion thread finishes the callback. a simple example to reproduce the deadlock is attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-763: - Status: Patch Available (was: Open) Deadlock on close w/ zkpython / c client Key: ZOOKEEPER-763 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.0 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk Reporter: Kapil Thangavelu Assignee: Henry Robinson Fix For: 3.3.1, 3.4.0 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, ZOOKEEPER-763.patch deadlocks occur if we attempt to close a handle while there are any outstanding async requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion thread are terminated and joined, however w\ith outstanding async requests, the completion thread won't be in a joinable state, and we effectively hang when the main thread does the join. afaics ideal behavior would be on close of a handle, to effectively clear out any remaining callbacks and let the completion thread terminate. i've tried adding some bookkeeping to within a python client to guard against closing while there is an outstanding async completion request, but its an imperfect solution since even after the python callback is executed there is still a window for deadlock before the completion thread finishes the callback. a simple example to reproduce the deadlock is attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-763: - Attachment: ZOOKEEPER-763.patch Forgot --no-prefix again :/ Deadlock on close w/ zkpython / c client Key: ZOOKEEPER-763 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.0 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk Reporter: Kapil Thangavelu Assignee: Henry Robinson Fix For: 3.3.1, 3.4.0 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, ZOOKEEPER-763.patch, ZOOKEEPER-763.patch deadlocks occur if we attempt to close a handle while there are any outstanding async requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion thread are terminated and joined, however w\ith outstanding async requests, the completion thread won't be in a joinable state, and we effectively hang when the main thread does the join. afaics ideal behavior would be on close of a handle, to effectively clear out any remaining callbacks and let the completion thread terminate. i've tried adding some bookkeeping to within a python client to guard against closing while there is an outstanding async completion request, but its an imperfect solution since even after the python callback is executed there is still a window for deadlock before the completion thread finishes the callback. a simple example to reproduce the deadlock is attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-763: - Status: Patch Available (was: Open) Deadlock on close w/ zkpython / c client Key: ZOOKEEPER-763 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.0 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk Reporter: Kapil Thangavelu Assignee: Henry Robinson Fix For: 3.3.1, 3.4.0 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, ZOOKEEPER-763.patch, ZOOKEEPER-763.patch deadlocks occur if we attempt to close a handle while there are any outstanding async requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion thread are terminated and joined, however w\ith outstanding async requests, the completion thread won't be in a joinable state, and we effectively hang when the main thread does the join. afaics ideal behavior would be on close of a handle, to effectively clear out any remaining callbacks and let the completion thread terminate. i've tried adding some bookkeeping to within a python client to guard against closing while there is an outstanding async completion request, but its an imperfect solution since even after the python callback is executed there is still a window for deadlock before the completion thread finishes the callback. a simple example to reproduce the deadlock is attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-763: - Status: Open (was: Patch Available) Deadlock on close w/ zkpython / c client Key: ZOOKEEPER-763 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.0 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk Reporter: Kapil Thangavelu Assignee: Henry Robinson Fix For: 3.3.1, 3.4.0 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, ZOOKEEPER-763.patch, ZOOKEEPER-763.patch deadlocks occur if we attempt to close a handle while there are any outstanding async requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion thread are terminated and joined, however w\ith outstanding async requests, the completion thread won't be in a joinable state, and we effectively hang when the main thread does the join. afaics ideal behavior would be on close of a handle, to effectively clear out any remaining callbacks and let the completion thread terminate. i've tried adding some bookkeeping to within a python client to guard against closing while there is an outstanding async completion request, but its an imperfect solution since even after the python callback is executed there is still a window for deadlock before the completion thread finishes the callback. a simple example to reproduce the deadlock is attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-764) Observer elected leader due to inconsistent voting view
[ https://issues.apache.org/jira/browse/ZOOKEEPER-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-764: - Attachment: ZOOKEEPER-764_3_3_1.patch Patch to apply against 3_3_1 Observer elected leader due to inconsistent voting view --- Key: ZOOKEEPER-764 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-764 Project: Zookeeper Issue Type: Bug Components: quorum Reporter: Flavio Paiva Junqueira Assignee: Henry Robinson Fix For: 3.3.1, 3.4.0 Attachments: ZOOKEEPER-690.patch, ZOOKEEPER-764_3_3_1.patch In ZOOKEEPER-690, we noticed that an observer was being elected, and Henry proposed a patch to fix the issue. However, it seems that the patch does not solve the issue one user (Alan Cabrera) has observed. Given that we would like to fix this issue, and to work separately with Alan to determine the problem with his setup, I'm creating this jira and re-posting Henry's patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863902#action_12863902 ] Henry Robinson commented on ZOOKEEPER-690: -- Hi Alan - Looking at this attachment: nohup-AsyncHammerTest-201004301209.txt - the tests appear to be run twice. The first testObserversHammer completes successfully, the second fails. Were you running the tests until you experienced the failure? Henry AsyncTestHammer test fails on hudson. - Key: ZOOKEEPER-690 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Henry Robinson Priority: Blocker Fix For: 3.3.1, 3.4.0 Attachments: jstack-201004201053.txt, jstack-201004291409.txt, jstack-201004291527.txt, jstack-AsyncHammerTest-201004301209.txt, nohup-201004201053.txt, nohup-201004291409.txt, nohup-201004291527.txt, nohup-AsyncHammerTest-201004301209.txt, nohup-QuorumPeerMainTest-201004301209.txt, TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch the hudson test failed on http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/. There are huge set of cancelledkeyexceptions in the logs. Still going through the logs to find out the reason for failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863915#action_12863915 ] Henry Robinson commented on ZOOKEEPER-690: -- Weird - it looks like the test is shutting down correctly: [junit] 2010-04-30 11:41:52,896 - INFO [main:clientb...@222] - connecting to 127.0.0.1 11233 [junit] 2010-04-30 11:41:52,896 - INFO [main:quorumb...@277] - 127.0.0.1:11233 is no longer accepting client connections [junit] 2010-04-30 11:41:52,896 - INFO [main:clientb...@222] - connecting to 127.0.0.1 11234 [junit] 2010-04-30 11:41:52,897 - INFO [main:quorumb...@277] - 127.0.0.1:11234 is no longer accepting client connections [junit] 2010-04-30 11:41:52,897 - INFO [main:clientb...@222] - connecting to 127.0.0.1 11235 [junit] 2010-04-30 11:41:52,897 - INFO [main:quorumb...@277] - 127.0.0.1:11235 is no longer accepting client connections [junit] 2010-04-30 11:41:52,897 - INFO [main:clientb...@222] - connecting to 127.0.0.1 11236 [junit] 2010-04-30 11:41:52,898 - INFO [main:quorumb...@277] - 127.0.0.1:11236 is no longer accepting client connections [junit] 2010-04-30 11:41:52,898 - INFO [main:clientb...@222] - connecting to 127.0.0.1 11237 [junit] 2010-04-30 11:41:52,898 - INFO [main:quorumb...@277] - 127.0.0.1:11237 is no longer accepting client connections [junit] 2010-04-30 11:41:52,901 - INFO [main:junit4zktestrunner$loggedinvokemet...@56] - FINISHED TEST METHOD testObserversHammer [junit] 2010-04-30 11:41:52,901 - INFO [main:zktestcas...@59] - SUCCEEDED testObserversHammer [junit] 2010-04-30 11:41:52,901 - INFO [main:zktestcas...@54] - FINISHED testObserversHammer and then it goes into trying the C tests which fail for an unrelated reason - does it lock up at this point or does it actually fail out to the CLI? If it locks up, is the jstack output you attached from that run? AsyncTestHammer test fails on hudson. - Key: ZOOKEEPER-690 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Henry Robinson Priority: Blocker Fix For: 3.3.1, 3.4.0 Attachments: jstack-201004201053.txt, jstack-201004291409.txt, jstack-201004291527.txt, jstack-AsyncHammerTest-201004301209.txt, nohup-201004201053.txt, nohup-201004291409.txt, nohup-201004291527.txt, nohup-AsyncHammerTest-201004301209.txt, nohup-QuorumPeerMainTest-201004301209.txt, TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch the hudson test failed on http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/. There are huge set of cancelledkeyexceptions in the logs. Still going through the logs to find out the reason for failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: ZOOKEEPER-107 - Allow dynamic changes to server cluster membership
Hi Vishal - Great that you're interested in contributing! This would be a really neat feature to get into ZK. The documentation that exists is essentially all on the JIRA. I had a patch that 'worked' but was nowhere near commit-ready. I'm trying to dig it up, but it appears it may have gone to the great bit-bucket in the sky. Trunk has moved sufficiently that a new patch would be required anyhow. There were two main difficulties with this issue. The first is changing the voting protocol to cope with changes in views. Since proposals are pipelined, the leader needs to keep track of what the view was that should vote for a proposal. IIRC, the other subtlety is making sure that when a view change is proposed, a quorum of votes is received from both the outgoing view and the incoming one. Otherwise it's possible to transition to a 'dead' view in which no progress can be made. The second is to figure out the metadata management - how do we 'find' ZooKeeper servers if the ensemble may have moved onto a completely separate set of machines? That is, if the original ensemble was on A, B, C and the current ensemble is D, E, F - where do we look to find where the ensemble is located? The first is a solved issue, the second is more a matter of taste than designing distributed protocols. Really happy to help with this issue - I'd love to see it get resurrected. cheers, Henry On 3 May 2010 07:25, Vishal K vishalm...@gmail.com wrote: Hi Henry, I just commented on the Jira. I would be happy to contribute. Please advise on the current status and next steps. Thanks. Regards, -Vishal -- Henry Robinson Software Engineer Cloudera 415-994-6679
Re: ZOOKEEPER-107 - Allow dynamic changes to server cluster membership
Hi Vishal - That's right - design, not implementation! I'd encourage you to share a design document once you feel you understand exactly what's required. This is probably going to be complex patch and reviewers will need a study guide :) cheers, Henry On 3 May 2010 10:26, Vishal Kher vishalm...@gmail.com wrote: Hi Henry, Thanks for the info. I will spend some more time to understand the issues before starting with the implementation. I will let you know if I have any questions (which I am sure I will). Just to clarify, by solved issue you mean from design perspective and not from implementation right? Regards, -Vishal On Mon, May 3, 2010 at 1:16 PM, Henry Robinson he...@cloudera.com wrote: Hi Vishal - Great that you're interested in contributing! This would be a really neat feature to get into ZK. The documentation that exists is essentially all on the JIRA. I had a patch that 'worked' but was nowhere near commit-ready. I'm trying to dig it up, but it appears it may have gone to the great bit-bucket in the sky. Trunk has moved sufficiently that a new patch would be required anyhow. There were two main difficulties with this issue. The first is changing the voting protocol to cope with changes in views. Since proposals are pipelined, the leader needs to keep track of what the view was that should vote for a proposal. IIRC, the other subtlety is making sure that when a view change is proposed, a quorum of votes is received from both the outgoing view and the incoming one. Otherwise it's possible to transition to a 'dead' view in which no progress can be made. The second is to figure out the metadata management - how do we 'find' ZooKeeper servers if the ensemble may have moved onto a completely separate set of machines? That is, if the original ensemble was on A, B, C and the current ensemble is D, E, F - where do we look to find where the ensemble is located? The first is a solved issue, the second is more a matter of taste than designing distributed protocols. Really happy to help with this issue - I'd love to see it get resurrected. cheers, Henry On 3 May 2010 07:25, Vishal K vishalm...@gmail.com wrote: Hi Henry, I just commented on the Jira. I would be happy to contribute. Please advise on the current status and next steps. Thanks. Regards, -Vishal -- Henry Robinson Software Engineer Cloudera 415-994-6679 -- Henry Robinson Software Engineer Cloudera 415-994-6679
Re: Dynamic adding/removing ZK servers on client
On 3 May 2010 16:40, Dave Wright wrig...@gmail.com wrote: Should this be a znode in the privileged namespace? I think having a znode for the current cluster members is part of the ZOOKEEPER-107 proposal, with the idea being that you could get/set the membership just by writing to that node. On the client side, you could watch that znode and update your server list when it changes. This is tricky: what happens if the server your client is connected to is decommissioned by a view change, and you are unable to locate another server to connect to because other view changes committed while you are reconnecting have removed all the servers you knew about. We'd need to make sure that watches on this znode were fired before a view change, but it's hard to know how to avoid having to wait for a session timeout before a client that might just be migrating servers reappears in order to make sure it sees the veiw change. Even then, the problem of 'locating' the cluster still exists in the case that there are no clients connected to tell anyone about it. Henry -- Henry Robinson Software Engineer Cloudera 415-994-6679
[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key
[ https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-758: - Attachment: ZOOKEEPER-758.patch Kapil - Thanks for the patch! Unfortunately it didn't apply cleanly against trunk because I think you had added 'test_acl_validity' to acl_test.py which was not included in the diff. I'm attaching a patch that applies cleanly to trunk - no code changes from your patch. Thanks, Henry zkpython segfaults on invalid acl with missing key -- Key: ZOOKEEPER-758 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.0, 3.4.0 Environment: ubuntu lucid (10.04) Reporter: Kapil Thangavelu Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch Currently when setting an acl, there is a minimal parse to ensure that its a list of dicts, however if one of the dicts is missing a required key, the subsequent usage doesn't check for it, and will segfault.. for example using an acl of [{schema:id, id:world, permissions:PERM_ALL}] will segfault if used, because the scheme key is missing (its been purposefully typo'd to schema in example). I've expanded the check_acl macro to include verifying that all keys are present and added some unit tests against trunk in the attachments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key
[ https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-758: - Status: Patch Available (was: Open) Hadoop Flags: [Reviewed] I have reviewed this, and it looks good. Thanks Kapil! zkpython segfaults on invalid acl with missing key -- Key: ZOOKEEPER-758 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.0, 3.4.0 Environment: ubuntu lucid (10.04) Reporter: Kapil Thangavelu Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch Currently when setting an acl, there is a minimal parse to ensure that its a list of dicts, however if one of the dicts is missing a required key, the subsequent usage doesn't check for it, and will segfault.. for example using an acl of [{schema:id, id:world, permissions:PERM_ALL}] will segfault if used, because the scheme key is missing (its been purposefully typo'd to schema in example). I've expanded the check_acl macro to include verifying that all keys are present and added some unit tests against trunk in the attachments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key
[ https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-758: - Attachment: ZOOKEEPER-758.patch forgot --no-prefix. zkpython segfaults on invalid acl with missing key -- Key: ZOOKEEPER-758 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.0, 3.4.0 Environment: ubuntu lucid (10.04) Reporter: Kapil Thangavelu Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch, ZOOKEEPER-758.patch Currently when setting an acl, there is a minimal parse to ensure that its a list of dicts, however if one of the dicts is missing a required key, the subsequent usage doesn't check for it, and will segfault.. for example using an acl of [{schema:id, id:world, permissions:PERM_ALL}] will segfault if used, because the scheme key is missing (its been purposefully typo'd to schema in example). I've expanded the check_acl macro to include verifying that all keys are present and added some unit tests against trunk in the attachments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key
[ https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-758: - Status: Patch Available (was: Open) zkpython segfaults on invalid acl with missing key -- Key: ZOOKEEPER-758 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.0, 3.4.0 Environment: ubuntu lucid (10.04) Reporter: Kapil Thangavelu Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch, ZOOKEEPER-758.patch Currently when setting an acl, there is a minimal parse to ensure that its a list of dicts, however if one of the dicts is missing a required key, the subsequent usage doesn't check for it, and will segfault.. for example using an acl of [{schema:id, id:world, permissions:PERM_ALL}] will segfault if used, because the scheme key is missing (its been purposefully typo'd to schema in example). I've expanded the check_acl macro to include verifying that all keys are present and added some unit tests against trunk in the attachments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key
[ https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-758: - Status: Open (was: Patch Available) zkpython segfaults on invalid acl with missing key -- Key: ZOOKEEPER-758 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.0, 3.4.0 Environment: ubuntu lucid (10.04) Reporter: Kapil Thangavelu Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch, ZOOKEEPER-758.patch Currently when setting an acl, there is a minimal parse to ensure that its a list of dicts, however if one of the dicts is missing a required key, the subsequent usage doesn't check for it, and will segfault.. for example using an acl of [{schema:id, id:world, permissions:PERM_ALL}] will segfault if used, because the scheme key is missing (its been purposefully typo'd to schema in example). I've expanded the check_acl macro to include verifying that all keys are present and added some unit tests against trunk in the attachments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key
[ https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-758: - Status: Resolved (was: Patch Available) Fix Version/s: 3.3.1 3.4.0 Resolution: Fixed I just committed this. Thanks Kapil! zkpython segfaults on invalid acl with missing key -- Key: ZOOKEEPER-758 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.0, 3.4.0 Environment: ubuntu lucid (10.04) Reporter: Kapil Thangavelu Fix For: 3.3.1, 3.4.0 Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch, ZOOKEEPER-758.patch Currently when setting an acl, there is a minimal parse to ensure that its a list of dicts, however if one of the dicts is missing a required key, the subsequent usage doesn't check for it, and will segfault.. for example using an acl of [{schema:id, id:world, permissions:PERM_ALL}] will segfault if used, because the scheme key is missing (its been purposefully typo'd to schema in example). I've expanded the check_acl macro to include verifying that all keys are present and added some unit tests against trunk in the attachments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-690: - Attachment: ZOOKEEPER-690.patch I have found what I hope is the problem. Because QuorumPeers duplicate their 'LearnerType' in two places there's the possibility that they may get out of sync. This is what was happening here - it was a test bug. Although the Observers knew that they were Observers, the other nodes did not. This affected the leader election protocol as other node did not know to reject an Observer. I feel like we should refactor the QuorumPeer.QuorumServer code so as not to duplicate information, but for the time being I think this patch will work. I have also taken the opportunity to standardise the naming of 'learnertype' throughout the code (in some places it was called 'peertype' adding to the confusion). Tests pass on my machine, but I can't guarantee that the problem is fixed as I could never recreate the error. Thanks to Flavio for catching the broken invariant! AsyncTestHammer test fails on hudson. - Key: ZOOKEEPER-690 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Henry Robinson Priority: Blocker Fix For: 3.3.1, 3.4.0 Attachments: jstack-201004201053.txt, nohup-201004201053.txt, TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, ZOOKEEPER-690.patch the hudson test failed on http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/. There are huge set of cancelledkeyexceptions in the logs. Still going through the logs to find out the reason for failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862351#action_12862351 ] Henry Robinson commented on ZOOKEEPER-690: -- Alan - can you try this patch to see if it fixes things? Thanks, Henry AsyncTestHammer test fails on hudson. - Key: ZOOKEEPER-690 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Henry Robinson Priority: Blocker Fix For: 3.3.1, 3.4.0 Attachments: jstack-201004201053.txt, nohup-201004201053.txt, TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, ZOOKEEPER-690.patch the hudson test failed on http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/. There are huge set of cancelledkeyexceptions in the logs. Still going through the logs to find out the reason for failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-690: - Status: Patch Available (was: Open) AsyncTestHammer test fails on hudson. - Key: ZOOKEEPER-690 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Henry Robinson Priority: Blocker Fix For: 3.3.1, 3.4.0 Attachments: jstack-201004201053.txt, nohup-201004201053.txt, TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, ZOOKEEPER-690.patch the hudson test failed on http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/. There are huge set of cancelledkeyexceptions in the logs. Still going through the logs to find out the reason for failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862424#action_12862424 ] Henry Robinson commented on ZOOKEEPER-690: -- This map is, I think, shared between the quorumpeers for the purposes of the test (and in general there aren't two quorumpeers sharing this datastructure when running normally). But! The error here is that I'm dumb (and that Java's type-checking leaves a little to be desired). I've written quorumPeers.containsValue up there, but actually it should be quorumPeers.containsKey. New patch on the way, let's see if that fixes it. AsyncTestHammer test fails on hudson. - Key: ZOOKEEPER-690 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Henry Robinson Priority: Blocker Fix For: 3.3.1, 3.4.0 Attachments: jstack-201004201053.txt, jstack-201004291409.txt, nohup-201004201053.txt, nohup-201004291409.txt, TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, ZOOKEEPER-690.patch the hudson test failed on http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/. There are huge set of cancelledkeyexceptions in the logs. Still going through the logs to find out the reason for failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-690: - Attachment: ZOOKEEPER-690.patch Alan - would you mind trying this new patch? Thanks for your patience. I suspect that something might still be a bit flaky with these tests (not the code, but the tests), but I hope this will fix this particular problem. AsyncTestHammer test fails on hudson. - Key: ZOOKEEPER-690 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Henry Robinson Priority: Blocker Fix For: 3.3.1, 3.4.0 Attachments: jstack-201004201053.txt, jstack-201004291409.txt, nohup-201004201053.txt, nohup-201004291409.txt, TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch the hudson test failed on http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/. There are huge set of cancelledkeyexceptions in the logs. Still going through the logs to find out the reason for failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862482#action_12862482 ] Henry Robinson commented on ZOOKEEPER-690: -- Ben - Agreed. I see this as the same as setMyid(...) - it sets an immutable value and should only be called once. I'd prefer if these parameters were 'final' in QuorumPeer and set in the constructor, but that's not the way that runFromConfig (the only place outside of tests that these methods are called) is written. Then we could get rid of setLearnerType, for sure. The real error here, I think, is duplicating the learnertype between QuorumPeer and QuorumServer. If we are going to have the list of QuorumServers, then getLearnerType should lookup the learner type in the peer map. Same for the serverid, perhaps, and we should just save a reference to the QuorumServer that represents our Quorumpeer. AsyncTestHammer test fails on hudson. - Key: ZOOKEEPER-690 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Henry Robinson Priority: Blocker Fix For: 3.3.1, 3.4.0 Attachments: jstack-201004201053.txt, jstack-201004291409.txt, jstack-201004291527.txt, nohup-201004201053.txt, nohup-201004291409.txt, nohup-201004291527.txt, TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch the hudson test failed on http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/. There are huge set of cancelledkeyexceptions in the logs. Still going through the logs to find out the reason for failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861865#action_12861865 ] Henry Robinson commented on ZOOKEEPER-690: -- Progress update - possibly to do with a bug in FLE allowing an Observer to be elected. We're looking into this now. AsyncTestHammer test fails on hudson. - Key: ZOOKEEPER-690 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Henry Robinson Priority: Blocker Fix For: 3.3.1, 3.4.0 Attachments: jstack-201004201053.txt, nohup-201004201053.txt, TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log the hudson test failed on http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/. There are huge set of cancelledkeyexceptions in the logs. Still going through the logs to find out the reason for failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-749) OSGi metadata not included in binary only jar
[ https://issues.apache.org/jira/browse/ZOOKEEPER-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-749: - Hadoop Flags: [Reviewed] +1, patch looks good to me. Tests failing was a quirk of Hudson, as this patch doesn't test code. ant bin-jar works correctly. OSGi metadata not included in binary only jar - Key: ZOOKEEPER-749 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-749 Project: Zookeeper Issue Type: Bug Components: build Affects Versions: 3.3.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Critical Fix For: 3.3.1, 3.4.0 Attachments: ZOOKEEPER-749.patch See this JIRA/comment for background: https://issues.apache.org/jira/browse/ZOOKEEPER-425?focusedCommentId=12859697page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12859697 basically the issue is that OSGi metadata is included in the legacy jar (zookeeper-version.jar) but not in the binary only jar (zookeeper-version-bin.jar) which is eventually deployed to the maven repo. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-749) OSGi metadata not included in binary only jar
[ https://issues.apache.org/jira/browse/ZOOKEEPER-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-749: - Status: Resolved (was: Patch Available) Resolution: Fixed I just committed this. Thanks Patrick! OSGi metadata not included in binary only jar - Key: ZOOKEEPER-749 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-749 Project: Zookeeper Issue Type: Bug Components: build Affects Versions: 3.3.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Critical Fix For: 3.3.1, 3.4.0 Attachments: ZOOKEEPER-749.patch See this JIRA/comment for background: https://issues.apache.org/jira/browse/ZOOKEEPER-425?focusedCommentId=12859697page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12859697 basically the issue is that OSGi metadata is included in the legacy jar (zookeeper-version.jar) but not in the binary only jar (zookeeper-version-bin.jar) which is eventually deployed to the maven repo. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (ZOOKEEPER-750) move maven artifacts into dist-maven subdir of the release (package target)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson resolved ZOOKEEPER-750. -- Resolution: Fixed I just committed ZOOKEEPER-749 (which addresses this as well). Thanks Patrick! move maven artifacts into dist-maven subdir of the release (package target) - Key: ZOOKEEPER-750 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-750 Project: Zookeeper Issue Type: Bug Components: build Affects Versions: 3.3.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.3.1, 3.4.0 The maven artifacts are currently (3.3.0) put into the toplevel of the release. This causes confusion amonst new users (ie which jar do I use?). Also the naming of the bin jar is wrong for maven (to put onto the maven repo it must be named without the -bin) which adds extra burden for the release manager. Putting into a subdir fixes this and makes it explicit what's being deployed to maven repo. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
ZooKeeper gets three Google Summer of Code students
Hi - Just wanted to announce to the community that we are lucky to have three talented students working on Google's Summer of Code projects directly related to ZooKeeper. Andrei Savu will be working with Patrick Hunt on a Web-based Administrative Interface, extending and improving Patrick's Django-based front end. Abmar Barros will be working with Flavio Junqueira on improving ZooKeeper's failure detector module - making the code cleaner and easier to try out new implementations, as well as implementing a few failure detection algorithms himself! Finally, Sergey Doroshenko will be working with me on a Read-Only Mode for ZooKeeper, which will help bolster ZK's availability in certain circumstances when a network partition is detected, as well as potentially optimising the read-path. (The full list of 450 GSoC students is here: http://socghop.appspot.com/gsoc/program/list_projects/google/gsoc2010) Congratulations to all three - we look forward to seeing what you produce over the summer. Thanks to everyone who applied, suggested projects and offered to mentor students; this program will have a big effect on ZooKeeper's visibility and community, as well as hopefully producing some great code! cheers, Henry -- Henry Robinson Software Engineer Cloudera 415-994-6679
[jira] Reopened: (ZOOKEEPER-740) zkpython leading to segfault on zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson reopened ZOOKEEPER-740: -- Ok, thanks for the update. Can you share the code that you are running to give the segfault? That will make it much easier for me to diagnose. zkpython leading to segfault on zookeeper - Key: ZOOKEEPER-740 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-740 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.0 Reporter: Federico Assignee: Henry Robinson Priority: Critical Fix For: 3.3.1, 3.4.0 The program that we are implementing uses the python binding for zookeeper but sometimes it crash with segfault; here is the bt from gdb: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xad244b70 (LWP 28216)] 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0) at ../Objects/abstract.c:2488 2488../Objects/abstract.c: No such file or directory. in ../Objects/abstract.c (gdb) bt #0 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0) at ../Objects/abstract.c:2488 #1 0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x862fab0, arg=0x8837194, kw=0x0) at ../Python/ceval.c:3575 #2 0x080612a0 in PyObject_CallObject (o=0x862fab0, a=0x8837194) at ../Objects/abstract.c:2480 #3 0x0047af42 in watcher_dispatch (zzh=0x86174e0, type=-1, state=1, path=0x86337c8 , context=0x8588660) at src/c/zookeeper.c:314 #4 0x00496559 in do_foreach_watcher (zh=0x86174e0, type=-1, state=1, path=0x86337c8 , list=0xa5354140) at src/zk_hashtable.c:275 #5 deliverWatchers (zh=0x86174e0, type=-1, state=1, path=0x86337c8 , list=0xa5354140) at src/zk_hashtable.c:317 #6 0x0048ae3c in process_completions (zh=0x86174e0) at src/zookeeper.c:1766 #7 0x0049706b in do_completion (v=0x86174e0) at src/mt_adaptor.c:333 #8 0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #9 0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-746) learner outputs session id to log in dec (should be hex)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-746: - Hadoop Flags: [Reviewed] +1, patch looks good to me. No tests required, pre-empting Hudsonbot. learner outputs session id to log in dec (should be hex) Key: ZOOKEEPER-746 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-746 Project: Zookeeper Issue Type: Bug Components: quorum, server Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Minor Fix For: 3.3.1, 3.4.0 Attachments: ZOOKEEPER-746.patch usability issue, should be in hex: 2010-04-21 11:31:13,827 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11354:lear...@95] - Revalidating client: 83353578391797760 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858665#action_12858665 ] Henry Robinson commented on ZOOKEEPER-690: -- Alan - that would be great. If you can take a jstack dump of the process when it hangs we can do some forensics. AsyncTestHammer test fails on hudson. - Key: ZOOKEEPER-690 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Patrick Hunt Priority: Critical Fix For: 3.3.1 the hudson test failed on http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/. There are huge set of cancelledkeyexceptions in the logs. Still going through the logs to find out the reason for failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-631) zkpython's C code could do with a style clean-up
[ https://issues.apache.org/jira/browse/ZOOKEEPER-631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-631: - Status: Patch Available (was: Open) zkpython's C code could do with a style clean-up Key: ZOOKEEPER-631 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-631 Project: Zookeeper Issue Type: Improvement Components: contrib-bindings Reporter: Henry Robinson Assignee: Henry Robinson Priority: Minor Attachments: ZOOKEEPER-631.patch Inconsistent formatting / use of parenthesis / some error checking - all need fixing. Also, the documentation in the header file could do with a reformat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-631) zkpython's C code could do with a style clean-up
[ https://issues.apache.org/jira/browse/ZOOKEEPER-631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858377#action_12858377 ] Henry Robinson commented on ZOOKEEPER-631: -- The existing tests are the ones that validate this patch. To test the Py_None and memory allocation issues is hard because in the first case the GC behaviour is hard to force and in the second we would have to stub out calloc(..) somehow! zkpython's C code could do with a style clean-up Key: ZOOKEEPER-631 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-631 Project: Zookeeper Issue Type: Improvement Components: contrib-bindings Reporter: Henry Robinson Assignee: Henry Robinson Priority: Minor Attachments: ZOOKEEPER-631.patch Inconsistent formatting / use of parenthesis / some error checking - all need fixing. Also, the documentation in the header file could do with a reformat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-742) Deallocatng None on writes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858064#action_12858064 ] Henry Robinson commented on ZOOKEEPER-742: -- Patch to ZOOKEEPER-631 should fix this issue - when that is committed, we can close out this ticket. Deallocatng None on writes -- Key: ZOOKEEPER-742 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-742 Project: Zookeeper Issue Type: Bug Components: c client, contrib, contrib-bindings Affects Versions: 3.2.2, 3.3.0 Environment: Redhat Enterprise 5.4 (python 2.4.3), Mac OS X 10.5.8 (python 2.5.1) Reporter: Josh Fraser Assignee: Henry Robinson Attachments: commands.py, foo.p, ZOOKEEPER-742.patch, ZOOKEEPER-742.patch On write operations, getting: Fatal Python error: deallocating None Aborted This error happens on write operations only. Here's the backtrace: Fatal Python error: deallocating None Program received signal SIGABRT, Aborted. 0x00383fc30215 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00383fc30215 in raise () from /lib64/libc.so.6 #1 0x00383fc31cc0 in abort () from /lib64/libc.so.6 #2 0x2adbd0be8189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0 #3 0x2adbd0bc7493 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #4 0x2adbd0bcab66 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #5 0x2adbd0bcbfe5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #6 0x2adbd0bcc032 in PyEval_EvalCode () from /usr/lib64/libpython2.4.so.1.0 #7 0x2adbd0be8729 in ?? () from /usr/lib64/libpython2.4.so.1.0 #8 0x2adbd0be9bd8 in PyRun_SimpleFileExFlags () from /usr/lib64/libpython2.4.so.1.0 #9 0x2adbd0bf000d in Py_Main () from /usr/lib64/libpython2.4.so.1.0 #10 0x00383fc1d974 in __libc_start_main () from /lib64/libc.so.6 #11 0x00400629 in _start () -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (ZOOKEEPER-729) Recursively delete a znode - zkCli.sh rmr /node
[ https://issues.apache.org/jira/browse/ZOOKEEPER-729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-729: - Status: Resolved (was: Patch Available) Resolution: Fixed I just committed this (had to move the test file into java/, but otherwise committed as submitted) - thanks Kay! Recursively delete a znode - zkCli.sh rmr /node Key: ZOOKEEPER-729 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-729 Project: Zookeeper Issue Type: New Feature Components: java client Reporter: Kay Kay Assignee: Kay Kay Fix For: 3.4.0 Attachments: ZOOKEEPER-729.patch, ZOOKEEPER-729.patch, ZOOKEEPER-729.patch, ZOOKEEPER-729.patch, ZOOKEEPER-729.patch Recursively delete a given znode in zookeeper, from the command-line. New operation rmr added to zkclient. $ ./zkCli.sh rmr /node -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (ZOOKEEPER-742) Deallocatng None on writes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857628#action_12857628 ] Henry Robinson commented on ZOOKEEPER-742: -- Thanks Josh - can you share the portion of your script that is causing the problem? Deallocatng None on writes -- Key: ZOOKEEPER-742 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-742 Project: Zookeeper Issue Type: Bug Components: c client, contrib, contrib-bindings Affects Versions: 3.2.2, 3.3.0 Environment: Redhat Enterprise 5.4 (python 2.4.3), Mac OS X 10.5.8 (python 2.5.1) Reporter: Josh Fraser On write operations, getting: Fatal Python error: deallocating None Aborted This error happens on write operations only. Here's the backtrace: Fatal Python error: deallocating None Program received signal SIGABRT, Aborted. 0x00383fc30215 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00383fc30215 in raise () from /lib64/libc.so.6 #1 0x00383fc31cc0 in abort () from /lib64/libc.so.6 #2 0x2adbd0be8189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0 #3 0x2adbd0bc7493 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #4 0x2adbd0bcab66 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #5 0x2adbd0bcbfe5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #6 0x2adbd0bcc032 in PyEval_EvalCode () from /usr/lib64/libpython2.4.so.1.0 #7 0x2adbd0be8729 in ?? () from /usr/lib64/libpython2.4.so.1.0 #8 0x2adbd0be9bd8 in PyRun_SimpleFileExFlags () from /usr/lib64/libpython2.4.so.1.0 #9 0x2adbd0bf000d in Py_Main () from /usr/lib64/libpython2.4.so.1.0 #10 0x00383fc1d974 in __libc_start_main () from /lib64/libc.so.6 #11 0x00400629 in _start () -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (ZOOKEEPER-742) Deallocatng None on writes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857638#action_12857638 ] Henry Robinson commented on ZOOKEEPER-742: -- Thanks very much for this - any chance you can share Commands as well, so that I can see the actual zookeeper API calls that are being made? Let me know if you're not comfortable posting it publicly. Deallocatng None on writes -- Key: ZOOKEEPER-742 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-742 Project: Zookeeper Issue Type: Bug Components: c client, contrib, contrib-bindings Affects Versions: 3.2.2, 3.3.0 Environment: Redhat Enterprise 5.4 (python 2.4.3), Mac OS X 10.5.8 (python 2.5.1) Reporter: Josh Fraser Attachments: foo.p On write operations, getting: Fatal Python error: deallocating None Aborted This error happens on write operations only. Here's the backtrace: Fatal Python error: deallocating None Program received signal SIGABRT, Aborted. 0x00383fc30215 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00383fc30215 in raise () from /lib64/libc.so.6 #1 0x00383fc31cc0 in abort () from /lib64/libc.so.6 #2 0x2adbd0be8189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0 #3 0x2adbd0bc7493 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #4 0x2adbd0bcab66 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #5 0x2adbd0bcbfe5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #6 0x2adbd0bcc032 in PyEval_EvalCode () from /usr/lib64/libpython2.4.so.1.0 #7 0x2adbd0be8729 in ?? () from /usr/lib64/libpython2.4.so.1.0 #8 0x2adbd0be9bd8 in PyRun_SimpleFileExFlags () from /usr/lib64/libpython2.4.so.1.0 #9 0x2adbd0bf000d in Py_Main () from /usr/lib64/libpython2.4.so.1.0 #10 0x00383fc1d974 in __libc_start_main () from /lib64/libc.so.6 #11 0x00400629 in _start () -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira