Re: New strategy for Netty (ZOOKEEPER-823) was: What's the QA strategy of ZooKeeper?
On Sat, Oct 16, 2010 at 1:56 AM, Thomas Koch tho...@koch.ro wrote: Benjamin Reed: actually, the other way of doing the netty patch (since i'm scared of merges) would be to do a refactor cleanup patch with an eye toward netty, and then another patch to actually add netty. [...] Ben you really need to give git a try and stop fearing the branch/merge. ;-) Seriously though, having a branch is not a big deal. In the end you an create one or more patches if you like and apply them, but this is essentially just a merge. My main concern personally is that a branch not go on for too long or get too big, ie incorporate too many changes, not focused. I believe that's not the case here though. Thomas would focus on 1) refactoring the client code to enable netty integration, 2) integrate netty changes. He'd also be adding 3) significant tests (potentially refactoring some code to better allow design for test) to ensure that the code changes (incl refactoring) don't break anything. For the record I'll add that this is pretty much what I did when creating this patch in the first place. Because it was not done on a svn branch, and it's just a big patch ball you can't see that. Also my goals were a bit different from Thomas's (which I'm fine with in principal). I've had exactly the same thought last evening. Instead of trying to find the bug(s) in the current patch, I'd like to start it over again and do small incremental changes from the current trunk towards the current ZOOKEEPER-823 patch. Maybe I could do this in ZOOKEEPER-823 patch, this would mean to revert the already applied ZOOKEEPER-823 patch. Thomas, did you mean to say do this in ZOOKEEPER-823 *branch*? Then I want to test each incremental step at least 5 times to find the step(s) that breaks ZK. This approach should take me another two weeks, I believe, mostly because each Test run takes ~15-25 minutes. This sounds like a reasonable plan to me if you want to try your hand at it. I also appreciate you stepping up on this effort. Unfortunately only committers can commit to apache SVN. Which means that one of us (ben/f/m/h/myself) will have to apply your change to the branch. You'll have to bug one of us when you're ready to apply a new patch to the branch. If you can create a new patch (rather than changing the original) that would be a good idea (easier for us to apply). Shouldn't be much of an issue I assume if you're using git personally. Notice that I've already setup a hudson job that pulls from the branch. https://hudson.apache.org/hudson/view/ZooKeeper/job/ZooKeeper_branch_823/https://hudson.apache.org/hudson/view/ZooKeeper/job/ZooKeeper_branch_823/43/ Regards, Patrick
[jira] Commented: (ZOOKEEPER-901) Redesign of QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921997#action_12921997 ] Flavio Junqueira commented on ZOOKEEPER-901: It is a good point, Pat. It crossed my mind, but I thought it would be overkill to use netty. However, if it is simpler to have it for compatibility and uniformity purposes, then we should consider it. Redesign of QuorumCnxManager Key: ZOOKEEPER-901 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-901 Project: Zookeeper Issue Type: Improvement Components: leaderElection Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Fix For: 3.4.0 QuorumCnxManager manages TCP connections between ZooKeeper servers for leader election in replicated mode. We have identified over time a couple of deficiencies that we would like to fix. Unfortunately, fixing these issues requires a little more than just generating a couple of small patches. More specifically, I propose, based on previous discussions with the community, that we reimplement QuorumCnxManager so that we achieve the following: # Establishing connections should not be a blocking operation, and perhaps even more important, it shouldn't prevent the establishment of connections with other servers; # Using a pair of threads per connection is a little messy, and we have seen issues over time due to the creation and destruction of such threads. A more reasonable approach is to have a single thread and a selector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Fix release 3.3.2 planning, status.
Hi guys, Any updates on the 3.3.2 release schedule? Trying to plan a release myself and wondering if I'll have to go to production with patched 3.3.1 or have time to QA with the 3.3.2 release. Thanks, Camille -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, September 23, 2010 12:45 PM To: zookeeper-dev@hadoop.apache.org Subject: Fix release 3.3.2 planning, status. Looking at the JIRA queue for 3.3.2 I see that there are two blockers, one is currently PA and the other is pretty close (it has a patch that should go in soon). There are a few JIRAs that already went into the branch that are important to get out there ASAP, esp ZOOKEEPER-846 (fix close issue found by hbase). One issue that's been slowing us down is hudson. The trunk was not passing it's hudson validation, which was causing a slow down in patch review. Mahadev and I fixed this. However with recent changes to the hudson hw/security environment the patch testing process (automated) is broken. Giri is working on this. In the mean time we'll have to test ourselves. Committers -- be sure to verify RAT, Findbugs, etc... in addition to verifying via test. I've setup an additional Hudson environment inside Cloudera that also verifies the trunk/branch. If issues are found I will report them (unfortunately I can't provide access to cloudera's hudson env to non-cloudera employees at this time). I'd like to clear out the PAs asap and get a release candidate built. Anyone see a problem with shooting for an RC mid next week? Patrick
[jira] Updated: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-794: --- Status: Open (was: Patch Available) Looks like some recent changes on branch 33 have broken this patch, I'll update the patch and resubmit. Callbacks are not invoked when the client is closed --- Key: ZOOKEEPER-794 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Reporter: Alexis Midon Assignee: Alexis Midon Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, ZOOKEEPER-794_2.patch, ZOOKEEPER-794_3.patch, ZOOKEEPER-794_4.patch.txt, ZOOKEEPER-794_5.patch.txt I noticed that ZooKeeper has different behaviors when calling synchronous or asynchronous actions on a closed ZooKeeper client. Actually a synchronous call will throw a session expired exception while an asynchronous call will do nothing. No exception, no callback invocation. Actually, even if the EventThread receives the Packet with the session expired err code, the packet is never processed since the thread has been killed by the ventOfDeath. So the call back is not invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-794: --- Attachment: ZOOKEEPER-794_5_br33.patch Attaching ZOOKEEPER-794_5_br33 which fixes patching this issue against current branch. Michi please give this another try. Ben/Flavio/Henry/Mahadev please review for commit asap. This is blocking 3.3.2 release. Thanks all! Callbacks are not invoked when the client is closed --- Key: ZOOKEEPER-794 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Reporter: Alexis Midon Assignee: Alexis Midon Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, ZOOKEEPER-794_2.patch, ZOOKEEPER-794_3.patch, ZOOKEEPER-794_4.patch.txt, ZOOKEEPER-794_5.patch.txt, ZOOKEEPER-794_5_br33.patch I noticed that ZooKeeper has different behaviors when calling synchronous or asynchronous actions on a closed ZooKeeper client. Actually a synchronous call will throw a session expired exception while an asynchronous call will do nothing. No exception, no callback invocation. Actually, even if the EventThread receives the Packet with the session expired err code, the packet is never processed since the thread has been killed by the ventOfDeath. So the call back is not invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-794: --- Status: Patch Available (was: Open) Callbacks are not invoked when the client is closed --- Key: ZOOKEEPER-794 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Reporter: Alexis Midon Assignee: Alexis Midon Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, ZOOKEEPER-794_2.patch, ZOOKEEPER-794_3.patch, ZOOKEEPER-794_4.patch.txt, ZOOKEEPER-794_5.patch.txt, ZOOKEEPER-794_5_br33.patch I noticed that ZooKeeper has different behaviors when calling synchronous or asynchronous actions on a closed ZooKeeper client. Actually a synchronous call will throw a session expired exception while an asynchronous call will do nothing. No exception, no callback invocation. Actually, even if the EventThread receives the Packet with the session expired err code, the packet is never processed since the thread has been killed by the ventOfDeath. So the call back is not invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-881) ZooKeeperServer.loadData loads database twice
[ https://issues.apache.org/jira/browse/ZOOKEEPER-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-881: --- Resolution: Fixed Status: Resolved (was: Patch Available) Ben forgot to close this issue. ZooKeeperServer.loadData loads database twice - Key: ZOOKEEPER-881 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-881 Project: Zookeeper Issue Type: Bug Components: server Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-881.patch zkDb.loadDataBase() is called twice at the beginning of loadData(). It shouldn't have any negative affects, but is unnecessary. A patch should be trivial. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (ZOOKEEPER-881) ZooKeeperServer.loadData loads database twice
[ https://issues.apache.org/jira/browse/ZOOKEEPER-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reopened ZOOKEEPER-881: Reopening - this was only committed to trunk, slated for 3.3.2 and trunk. ZooKeeperServer.loadData loads database twice - Key: ZOOKEEPER-881 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-881 Project: Zookeeper Issue Type: Bug Components: server Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-881.patch zkDb.loadDataBase() is called twice at the beginning of loadData(). It shouldn't have any negative affects, but is unnecessary. A patch should be trivial. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-901) Redesign of QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922139#action_12922139 ] Patrick Hunt commented on ZOOKEEPER-901: It would be great if you supported both netty/nio as part of this change, but it's really up to you. Going fwd (long term) we should covert everything over from nio to netty. It's not a matter of nio vs netty really - it's a matter of security. Netty provides encryption/auth out of the box (sorta), while we'd have to do a bunch of work to add this on top of our current nio. So really we're moving to netty primarily to get security for client server connectivity. An added bonus is that netty should simplify our code (although in the short term that's def not the case). Thanks! Redesign of QuorumCnxManager Key: ZOOKEEPER-901 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-901 Project: Zookeeper Issue Type: Improvement Components: leaderElection Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Fix For: 3.4.0 QuorumCnxManager manages TCP connections between ZooKeeper servers for leader election in replicated mode. We have identified over time a couple of deficiencies that we would like to fix. Unfortunately, fixing these issues requires a little more than just generating a couple of small patches. More specifically, I propose, based on previous discussions with the community, that we reimplement QuorumCnxManager so that we achieve the following: # Establishing connections should not be a blocking operation, and perhaps even more important, it shouldn't prevent the establishment of connections with other servers; # Using a pair of threads per connection is a little messy, and we have seen issues over time due to the creation and destruction of such threads. A more reasonable approach is to have a single thread and a selector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-902) Fix findbug issue in trunk Malicious code vulnerability
Fix findbug issue in trunk Malicious code vulnerability - Key: ZOOKEEPER-902 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-902 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.4.0 Reporter: Patrick Hunt Priority: Minor Fix For: 3.4.0 https://hudson.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/970/artifact/trunk/findbugs/zookeeper-findbugs-report.html#Warnings_MALICIOUS_CODE Malicious code vulnerability Warnings CodeWarning MS org.apache.zookeeper.server.quorum.LeaderElection.epochGen isn't final but should be -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-903) Create a testing jar with useful classes from ZK test source
Create a testing jar with useful classes from ZK test source Key: ZOOKEEPER-903 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-903 Project: Zookeeper Issue Type: Improvement Components: tests Reporter: Camille Fournier From mailing list: -Original Message- From: Benjamin Reed Sent: Monday, October 18, 2010 11:12 AM To: zookeeper-u...@hadoop.apache.org Subject: Re: Testing zookeeper outside the source distribution? we should be exposing those classes and releasing them as a testing jar. do you want to open up a jira to track this issue? ben On 10/18/2010 05:17 AM, Anthony Urso wrote: Anyone have any pointers on how to test against ZK outside of the source distribution? All the fun classes (e.g. ClientBase) do not make it into the ZK release jar. Right now I am manually running a ZK node for the unit tests to connect to prior to running my test, but I would rather have something that ant could reliably automate starting and stopping for CI. Thanks, Anthony -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (ZOOKEEPER-881) ZooKeeperServer.loadData loads database twice
[ https://issues.apache.org/jira/browse/ZOOKEEPER-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira resolved ZOOKEEPER-881. Resolution: Fixed Committed to the 3.3 branch (Committed revision 1023935.) ZooKeeperServer.loadData loads database twice - Key: ZOOKEEPER-881 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-881 Project: Zookeeper Issue Type: Bug Components: server Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-881.patch zkDb.loadDataBase() is called twice at the beginning of loadData(). It shouldn't have any negative affects, but is unnecessary. A patch should be trivial. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922209#action_12922209 ] Henry Robinson commented on ZOOKEEPER-888: -- The patch as it stands relies on ZOOKEEPER-853 (which it fixes) which is not in 3.3 as it is a small API change - it changes is_unrecoverable to return Python True or False, rather than ZINVALIDSTATE. So I'm not certain about what to do here - we try not to change APIs between minor versions. However, this is a very minor change, and this patch fixes a significant bug. I'm inclined to commit both 853 and this patch to 3.3 as well as trunk, and put a note in the release notes. Any objections? c-client / zkpython: Double free corruption on node watcher --- Key: ZOOKEEPER-888 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 Project: Zookeeper Issue Type: Bug Components: c client, contrib-bindings Affects Versions: 3.3.1 Reporter: Lukas Assignee: Lukas Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: resume-segfault.py, ZOOKEEPER-888.patch the c-client / zkpython wrapper invokes already freed watcher callback steps to reproduce: 0. start a zookeper server on your machine 1. run the attached python script 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) 3. wait until the connection and the node observer fired with a session event 4. resume the zookeeper server process (e.g. using `pkill -CONT -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) - the client tries to dispatch the node observer function again, but it was already freed - double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-786) Exception in ZooKeeper.toString
[ https://issues.apache.org/jira/browse/ZOOKEEPER-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-786: --- Priority: Minor (was: Major) Fix Version/s: (was: 3.3.2) Exception in ZooKeeper.toString --- Key: ZOOKEEPER-786 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-786 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: Mac OS X, x86 Reporter: Stephen Green Priority: Minor Fix For: 3.4.0 When trying to call ZooKeeper.toString during client disconnections, an exception can be generated: [04/06/10 15:39:57.744] ERROR Error while calling watcher java.lang.Error: java.net.SocketException: Socket operation on non-socket at sun.nio.ch.Net.localAddress(Net.java:128) at sun.nio.ch.SocketChannelImpl.localAddress(SocketChannelImpl.java:430) at sun.nio.ch.SocketAdaptor.getLocalAddress(SocketAdaptor.java:147) at java.net.Socket.getLocalSocketAddress(Socket.java:717) at org.apache.zookeeper.ClientCnxn.getLocalSocketAddress(ClientCnxn.java:227) at org.apache.zookeeper.ClientCnxn.toString(ClientCnxn.java:183) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at org.apache.zookeeper.ZooKeeper.toString(ZooKeeper.java:1486) at java.util.Formatter$FormatSpecifier.printString(Formatter.java:2794) at java.util.Formatter$FormatSpecifier.print(Formatter.java:2677) at java.util.Formatter.format(Formatter.java:2433) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.echonest.cluster.ZooContainer.process(ZooContainer.java:544) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488) Caused by: java.net.SocketException: Socket operation on non-socket at sun.nio.ch.Net.localInetAddress(Native Method) at sun.nio.ch.Net.localAddress(Net.java:125) ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-786) Exception in ZooKeeper.toString
[ https://issues.apache.org/jira/browse/ZOOKEEPER-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922216#action_12922216 ] Flavio Junqueira commented on ZOOKEEPER-786: Since this seems to be a minor issue and to avoid further delays with 3.3.2, I propose we move it to 3.4.0. Exception in ZooKeeper.toString --- Key: ZOOKEEPER-786 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-786 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: Mac OS X, x86 Reporter: Stephen Green Fix For: 3.4.0 When trying to call ZooKeeper.toString during client disconnections, an exception can be generated: [04/06/10 15:39:57.744] ERROR Error while calling watcher java.lang.Error: java.net.SocketException: Socket operation on non-socket at sun.nio.ch.Net.localAddress(Net.java:128) at sun.nio.ch.SocketChannelImpl.localAddress(SocketChannelImpl.java:430) at sun.nio.ch.SocketAdaptor.getLocalAddress(SocketAdaptor.java:147) at java.net.Socket.getLocalSocketAddress(Socket.java:717) at org.apache.zookeeper.ClientCnxn.getLocalSocketAddress(ClientCnxn.java:227) at org.apache.zookeeper.ClientCnxn.toString(ClientCnxn.java:183) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at org.apache.zookeeper.ZooKeeper.toString(ZooKeeper.java:1486) at java.util.Formatter$FormatSpecifier.printString(Formatter.java:2794) at java.util.Formatter$FormatSpecifier.print(Formatter.java:2677) at java.util.Formatter.format(Formatter.java:2433) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.echonest.cluster.ZooContainer.process(ZooContainer.java:544) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488) Caused by: java.net.SocketException: Socket operation on non-socket at sun.nio.ch.Net.localInetAddress(Native Method) at sun.nio.ch.Net.localAddress(Net.java:125) ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-855) clientPortBindAddress should be clientPortAddress
[ https://issues.apache.org/jira/browse/ZOOKEEPER-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922238#action_12922238 ] Flavio Junqueira commented on ZOOKEEPER-855: +1, I'll commit this in a minute. clientPortBindAddress should be clientPortAddress - Key: ZOOKEEPER-855 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-855 Project: Zookeeper Issue Type: Bug Components: documentation Affects Versions: 3.3.0, 3.3.1 Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-855.patch The server documentation states that the configuration parameter for binding to a specific ip address is clientPortBindAddress. The code believes the parameter is clientPortAddress. The documentation for 3.3.X versions needs changed to reflect the correct parameter . This parameter was added in ZOOKEEPER-635. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-897) C Client seg faults during close
[ https://issues.apache.org/jira/browse/ZOOKEEPER-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jared Cantwell updated ZOOKEEPER-897: - Attachment: ZOOKEEPER-897.patch Updated patch format and spelling. C Client seg faults during close Key: ZOOKEEPER-897 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-897 Project: Zookeeper Issue Type: Bug Components: c client Reporter: Jared Cantwell Assignee: Jared Cantwell Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEEPER-897.diff, ZOOKEEPER-897.patch We observed a crash while closing our c client. It was in the do_io() thread that was processing as during the close() call. #0 queue_buffer (list=0x6bd4f8, b=0x0, add_to_front=0) at src/zookeeper.c:969 #1 0x0046234e in check_events (zh=0x6bd480, events=value optimized out) at src/zookeeper.c:1687 #2 0x00462d74 in zookeeper_process (zh=0x6bd480, events=2) at src/zookeeper.c:1971 #3 0x00469c34 in do_io (v=0x6bd480) at src/mt_adaptor.c:311 #4 0x77bc59ca in start_thread () from /lib/libpthread.so.0 #5 0x76f706fd in clone () from /lib/libc.so.6 #6 0x in ?? () We tracked down the sequence of events, and the cause is that input_buffer is being freed from a thread other than the do_io thread that relies on it: 1. do_io() call check_events() 2. if(eventsZOOKEEPER_READ) branch executes 3. if (rc 0) branch executes 4. if (zh-input_buffer != zh-primer_buffer) branch executes .in the meantime.. 5. zookeeper_close() called 6. if (inc_ref_counter(zh,0)!=0) branch executes 7. cleanup_bufs() is called 8. input_buffer is freed at the end . back to check_events(). 9. queue_events() is called on a NULL buffer. I believe the patch is to only call free_completions() in zookeeper_close() and not cleanup_bufs(). The original reason cleanup_bufs() was added was to call any outstanding synhcronous completions, so only free_completions (which is guarded) is needed. I will submit a patch for review with this change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-898) C Client might not cleanup correctly during close
[ https://issues.apache.org/jira/browse/ZOOKEEPER-898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jared Cantwell updated ZOOKEEPER-898: - Attachment: ZOOKEEPER-898.patch Updated patch format and spelling. C Client might not cleanup correctly during close - Key: ZOOKEEPER-898 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-898 Project: Zookeeper Issue Type: Bug Components: c client Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEEPER-898.diff, ZOOKEEPER-898.patch I was looking through the c-client code and noticed a situation where a counter can be incorrectly incremented and a small memory leak can occur. In zookeeper.c : add_completion(), if close_requested is true, then the completion will not be queued. But at the end, outstanding_sync is still incremented and free() never called on the newly allocated completion_list_t. I will submit for review a diff that I believe corrects this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-855) clientPortBindAddress should be clientPortAddress
[ https://issues.apache.org/jira/browse/ZOOKEEPER-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-855: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks, Jared, I have just committed this: Branch 3.3: Committed revision 1024022. Trunk: Committed revision 1024029. clientPortBindAddress should be clientPortAddress - Key: ZOOKEEPER-855 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-855 Project: Zookeeper Issue Type: Bug Components: documentation Affects Versions: 3.3.0, 3.3.1 Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-855.patch, ZOOKEEPER-855.patch The server documentation states that the configuration parameter for binding to a specific ip address is clientPortBindAddress. The code believes the parameter is clientPortAddress. The documentation for 3.3.X versions needs changed to reflect the correct parameter . This parameter was added in ZOOKEEPER-635. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-855) clientPortBindAddress should be clientPortAddress
[ https://issues.apache.org/jira/browse/ZOOKEEPER-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-855: --- Attachment: ZOOKEEPER-855.patch I'm uploading the patch I committed. The original patch was modifying the html instead of the xml source. clientPortBindAddress should be clientPortAddress - Key: ZOOKEEPER-855 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-855 Project: Zookeeper Issue Type: Bug Components: documentation Affects Versions: 3.3.0, 3.3.1 Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-855.patch, ZOOKEEPER-855.patch The server documentation states that the configuration parameter for binding to a specific ip address is clientPortBindAddress. The code believes the parameter is clientPortAddress. The documentation for 3.3.X versions needs changed to reflect the correct parameter . This parameter was added in ZOOKEEPER-635. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922355#action_12922355 ] Patrick Hunt commented on ZOOKEEPER-888: Can we backport the 853 change into 3.3, but not change the API? Big overhead or straightforward/simple? c-client / zkpython: Double free corruption on node watcher --- Key: ZOOKEEPER-888 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 Project: Zookeeper Issue Type: Bug Components: c client, contrib-bindings Affects Versions: 3.3.1 Reporter: Lukas Assignee: Lukas Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: resume-segfault.py, ZOOKEEPER-888.patch the c-client / zkpython wrapper invokes already freed watcher callback steps to reproduce: 0. start a zookeper server on your machine 1. run the attached python script 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) 3. wait until the connection and the node observer fired with a session event 4. resume the zookeeper server process (e.g. using `pkill -CONT -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) - the client tries to dispatch the node observer function again, but it was already freed - double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922382#action_12922382 ] Michi Mutsuzaki commented on ZOOKEEPER-794: --- ZOOKEEPER-794_5_br33.patch worked. --Michi Callbacks are not invoked when the client is closed --- Key: ZOOKEEPER-794 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Reporter: Alexis Midon Assignee: Alexis Midon Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, ZOOKEEPER-794_2.patch, ZOOKEEPER-794_3.patch, ZOOKEEPER-794_4.patch.txt, ZOOKEEPER-794_5.patch.txt, ZOOKEEPER-794_5_br33.patch I noticed that ZooKeeper has different behaviors when calling synchronous or asynchronous actions on a closed ZooKeeper client. Actually a synchronous call will throw a session expired exception while an asynchronous call will do nothing. No exception, no callback invocation. Actually, even if the EventThread receives the Packet with the session expired err code, the packet is never processed since the thread has been killed by the ventOfDeath. So the call back is not invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Austin Shoemaker updated ZOOKEEPER-888: --- Attachment: ZOOKEEPER-888-3.3.patch Patch based on the 3.3 branch attached (ZOOKEEPER-888-3.3.patch). Verified that unit tests pass with the changes, including the new watcher_test. c-client / zkpython: Double free corruption on node watcher --- Key: ZOOKEEPER-888 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 Project: Zookeeper Issue Type: Bug Components: c client, contrib-bindings Affects Versions: 3.3.1 Reporter: Lukas Assignee: Lukas Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: resume-segfault.py, ZOOKEEPER-888-3.3.patch, ZOOKEEPER-888.patch the c-client / zkpython wrapper invokes already freed watcher callback steps to reproduce: 0. start a zookeper server on your machine 1. run the attached python script 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) 3. wait until the connection and the node observer fired with a session event 4. resume the zookeeper server process (e.g. using `pkill -CONT -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) - the client tries to dispatch the node observer function again, but it was already freed - double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.