RE: closing session on socket close vs waiting for timeout
I'm reopening this question for the group. I have attached some sample code (3.3 branch) to a jira tracker that seems to do what I propose, namely, lower the session timeout in the case of an error causing the socket to close. https://issues.apache.org/jira/browse/ZOOKEEPER-922 I am very interested in any feedback about what might fail here. I have this running in a dev ensemble and it seems to work, but I haven't done any sort of extensive testing or considered the effects of this on observers, etc. Even if the community doesn't want the change in ZK for reasons of false positives I may need to use it internally and could use any insights the experts have on unintended side effects. Thanks, Camille -Original Message- From: Benjamin Reed [mailto:br...@yahoo-inc.com] Sent: Friday, September 10, 2010 4:11 PM To: zookeeper-u...@hadoop.apache.org Subject: Re: closing session on socket close vs waiting for timeout ah dang, i should have said generate a close request for the session and push that through the system. ben On 09/10/2010 01:01 PM, Benjamin Reed wrote: the problem is that followers don't track session timeouts. they track when they last heard from the sessions that are connected to them and they periodically propagate this information to the leader. the leader is the one that expires the session. your technique only works when the client is connected to the leader. one thing you can do is generate a close request for the socket and push that through the system. that will cause it to get propagated through the followers and processed at the leader. it would also allow you to get your functionality without touching the processing pipeline. the thing that worries me about this functionality in general is that network anomalies can cause a whole raft of sessions to get expired in this way. for example, you have 3 servers with load spread well; there is a networking glitch that cause clients to abandon a server; suddenly 1/3 of your clients will get expired sessions. ben On 09/10/2010 12:17 PM, Fournier, Camille F. [Tech] wrote: Ben, could you explain a bit more why you think this won't work? I'm trying to decide if I should put in the work to take the POC I wrote and complete it, but I don't really want to waste my time if there's a fundamental reason it's a bad idea. Thanks, Camille -Original Message- From: Benjamin Reed [mailto:br...@yahoo-inc.com] Sent: Wednesday, September 08, 2010 4:03 PM To: zookeeper-u...@hadoop.apache.org Subject: Re: closing session on socket close vs waiting for timeout unfortunately, that only works on the standalone server. ben On 09/08/2010 12:52 PM, Fournier, Camille F. [Tech] wrote: This would be the ideal solution to this problem I think. Poking around the (3.3) code to figure out how hard it would be to implement, I figure one way to do it would be to modify the session timeout to the min session timeout and touch the connection before calling close when you get certain exceptions in NIOServerCnxn.doIO. I did this (removing the code in touch session that returns if the tickTime is greater than the expire time) and it worked (in the standalone server anyway). Interesting solution, or total hack that will not work beyond most basic test case? C (forgive lack of actual code in this email) -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Tuesday, September 07, 2010 1:11 PM To: zookeeper-u...@hadoop.apache.org Cc: Benjamin Reed Subject: Re: closing session on socket close vs waiting for timeout This really is, just as Ben says a problem of false positives and false negatives in detecting session expiration. On the other hand, the current algorithm isn't really using all the information available. The current algorithm is using time since last client initiated heartbeat. The new proposal is somewhat worse in that it proposes to use just the boolean has-TCP-disconnect-happened. Perhaps it would be better to use multiple features in order to decrease both false positives and false negatives. For instance, I could imagine that we use the following features: - time since last client hearbeat or disconnect or reconnect - what was the last event? (a heartbeat or a disconnect or a reconnect) Then the expiration algorithm could use a relatively long time since last heartbeat and a relatively short time since last disconnect to mark a session as disconnected. Wouldn't this avoid expiration during GC and cluster partition and cause expiration quickly after a client disconnect? On Mon, Sep 6, 2010 at 11:26 PM, Patrick Huntph...@apache.orgwrote: That's a good point, however with suitable documentation, warnings and such it seems like a reasonable feature to provide for those users who require it. Used in moderation it seems fine to me. Perhaps we also make it configurable at the server level for those
Windows port of ZK C api
Hi everyone, We have a requirement for a native windows-compatible version of the ZK C api. We're currently working on various ways to do this port, but would very much like to submit this back to you all when we are finished so that we don't have to maintain the code ourselves through future releases. Is there interest in having this? What would you need with this patch (build scripts, etc) to accept it? Thanks, Camille
RE: Windows port of ZK C api
Thanks Mahadev. We are using those C# bindings but also need native windows C/C++. Every language all the time! C -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Wednesday, November 03, 2010 11:06 AM To: zookeeper-dev@hadoop.apache.org Subject: Re: Windows port of ZK C api Hi Camille, I think definitely there is. I think a build script with a set of requirements and a nice set of docs on how to start using it would be great. BTW, there is a C# binding which someone wrote earlier http://wiki.apache.org/hadoop/ZooKeeper/ZKClientBindings You can take a look at that and see if you want to extend that or write your own. Thanks mahadev On 11/3/10 7:18 AM, Fournier, Camille F. [Tech] camille.fourn...@gs.com wrote: Hi everyone, We have a requirement for a native windows-compatible version of the ZK C api. We're currently working on various ways to do this port, but would very much like to submit this back to you all when we are finished so that we don't have to maintain the code ourselves through future releases. Is there interest in having this? What would you need with this patch (build scripts, etc) to accept it? Thanks, Camille
RE: implications of netty on client connections
Yes, that's correct. C -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Friday, October 22, 2010 1:39 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: implications of netty on client connections Hi Camille, I am a little curious here. Does this mean you tried a single zookeeper server with 16K clients? Thanks mahadev On 10/20/10 1:07 PM, Fournier, Camille F. [Tech] camille.fourn...@gs.com wrote: Thanks Patrick, I'll look and see if I can figure out a clean change for this. It was the kernel limit for max number of open fds for the process that was where the problem shows up (not zk limit). FWIW, we tested with a process fd limit of 16K, and ZK performed reasonably well until the fd limit was reached, at which point it choked. There was a throughput degradation, but mostly going from 0 to 4000 connections. 4000 to 16000 was mostly flat until the sharp drop. For our use case it is fine to have a bit of performance loss with huge numbers of connections, so long as we can handle the choke, which for initial rollout I'm planning on just monitoring for. C -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Wednesday, October 20, 2010 2:06 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: implications of netty on client connections It may just be the case that we haven't tested sufficiently for this case (running out of fds) and we need to handle this better even in nio. Probably by cutting off op_connect in the selector. We should be able to do similar in netty. Btw, on unix one can access the open/max fd count using this: http://download.oracle.com/javase/6/docs/jre/api/management/extension/com/sun/ management/UnixOperatingSystemMXBean.html Secondly, are you running into a kernel limit or a zk limit? Take a look at this post describing 1million concurrent connections to a box: http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb -part-3 specifically: -- During various test with lots of connections, I ended up making some additional changes to my sysctl.conf. This was part trial-and-error, I don't really know enough about the internals to make especially informed decisions about which values to change. My policy was to wait for things to break, check /var/log/kern.log and see what mysterious error was reported, then increase stuff that sounded sensible after a spot of googling. Here are the settings in place during the above test: net.core.rmem_max = 33554432 net.core.wmem_max = 33554432 net.ipv4.tcp_rmem = 4096 16384 33554432 net.ipv4.tcp_wmem = 4096 16384 33554432 net.ipv4.tcp_mem = 786432 1048576 26777216 net.ipv4.tcp_max_tw_buckets = 36 net.core.netdev_max_backlog = 2500 vm.min_free_kbytes = 65536 vm.swappiness = 0 net.ipv4.ip_local_port_range = 1024 65535 -- I'm guessing that even with this, at some point you'll run into a limit in our server implementation. In particular I suspect that we may start to respond more slowly to pings, eventually getting so bad it would time out. We'd have to debug that and address (optimize). http://www.metabrew.com/article/a-million-user-comet-application-with-mochiwe b-part-3 Patrick On Tue, Oct 19, 2010 at 7:16 AM, Fournier, Camille F. [Tech] camille.fourn...@gs.com wrote: Hi everyone, I'm curious what the implications of using netty are going to be for the case where a server gets close to its max available file descriptors. Right now our somewhat limited testing has shown that a ZK server performs fine up to the point when it runs out of available fds, at which point performance degrades sharply and new connections get into a somewhat bad state. Is netty going to enable the server to handle this situation more gracefully (or is there a way to do this already that I haven't found)? Limiting connections from the same client is not enough since we can potentially have far more clients wanting to connect than available fds for certain use cases we might consider. Thanks, Camille
RE: implications of netty on client connections
Thanks Patrick, I'll look and see if I can figure out a clean change for this. It was the kernel limit for max number of open fds for the process that was where the problem shows up (not zk limit). FWIW, we tested with a process fd limit of 16K, and ZK performed reasonably well until the fd limit was reached, at which point it choked. There was a throughput degradation, but mostly going from 0 to 4000 connections. 4000 to 16000 was mostly flat until the sharp drop. For our use case it is fine to have a bit of performance loss with huge numbers of connections, so long as we can handle the choke, which for initial rollout I'm planning on just monitoring for. C -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Wednesday, October 20, 2010 2:06 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: implications of netty on client connections It may just be the case that we haven't tested sufficiently for this case (running out of fds) and we need to handle this better even in nio. Probably by cutting off op_connect in the selector. We should be able to do similar in netty. Btw, on unix one can access the open/max fd count using this: http://download.oracle.com/javase/6/docs/jre/api/management/extension/com/sun/management/UnixOperatingSystemMXBean.html Secondly, are you running into a kernel limit or a zk limit? Take a look at this post describing 1million concurrent connections to a box: http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3 specifically: -- During various test with lots of connections, I ended up making some additional changes to my sysctl.conf. This was part trial-and-error, I don't really know enough about the internals to make especially informed decisions about which values to change. My policy was to wait for things to break, check /var/log/kern.log and see what mysterious error was reported, then increase stuff that sounded sensible after a spot of googling. Here are the settings in place during the above test: net.core.rmem_max = 33554432 net.core.wmem_max = 33554432 net.ipv4.tcp_rmem = 4096 16384 33554432 net.ipv4.tcp_wmem = 4096 16384 33554432 net.ipv4.tcp_mem = 786432 1048576 26777216 net.ipv4.tcp_max_tw_buckets = 36 net.core.netdev_max_backlog = 2500 vm.min_free_kbytes = 65536 vm.swappiness = 0 net.ipv4.ip_local_port_range = 1024 65535 -- I'm guessing that even with this, at some point you'll run into a limit in our server implementation. In particular I suspect that we may start to respond more slowly to pings, eventually getting so bad it would time out. We'd have to debug that and address (optimize). http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3 Patrick On Tue, Oct 19, 2010 at 7:16 AM, Fournier, Camille F. [Tech] camille.fourn...@gs.com wrote: Hi everyone, I'm curious what the implications of using netty are going to be for the case where a server gets close to its max available file descriptors. Right now our somewhat limited testing has shown that a ZK server performs fine up to the point when it runs out of available fds, at which point performance degrades sharply and new connections get into a somewhat bad state. Is netty going to enable the server to handle this situation more gracefully (or is there a way to do this already that I haven't found)? Limiting connections from the same client is not enough since we can potentially have far more clients wanting to connect than available fds for certain use cases we might consider. Thanks, Camille
RE: Fix release 3.3.2 planning, status.
Hi guys, Any updates on the 3.3.2 release schedule? Trying to plan a release myself and wondering if I'll have to go to production with patched 3.3.1 or have time to QA with the 3.3.2 release. Thanks, Camille -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, September 23, 2010 12:45 PM To: zookeeper-dev@hadoop.apache.org Subject: Fix release 3.3.2 planning, status. Looking at the JIRA queue for 3.3.2 I see that there are two blockers, one is currently PA and the other is pretty close (it has a patch that should go in soon). There are a few JIRAs that already went into the branch that are important to get out there ASAP, esp ZOOKEEPER-846 (fix close issue found by hbase). One issue that's been slowing us down is hudson. The trunk was not passing it's hudson validation, which was causing a slow down in patch review. Mahadev and I fixed this. However with recent changes to the hudson hw/security environment the patch testing process (automated) is broken. Giri is working on this. In the mean time we'll have to test ourselves. Committers -- be sure to verify RAT, Findbugs, etc... in addition to verifying via test. I've setup an additional Hudson environment inside Cloudera that also verifies the trunk/branch. If issues are found I will report them (unfortunately I can't provide access to cloudera's hudson env to non-cloudera employees at this time). I'd like to clear out the PAs asap and get a release candidate built. Anyone see a problem with shooting for an RC mid next week? Patrick
RE: [jira] Updated: (ZOOKEEPER-844) handle auth failure in java client
Hi everyone, Can someone explain what I should do for this? I have a patch for both 3.4 and 3.3, and I think the 3.3 patch caused issues in the automated patch applier. What do I need to do to submit both of these patches to the different branches? Thanks, Camille -Original Message- From: Camille Fournier (JIRA) [mailto:j...@apache.org] Sent: Thursday, September 16, 2010 2:25 PM To: zookeeper-dev@hadoop.apache.org Subject: [jira] Updated: (ZOOKEEPER-844) handle auth failure in java client [ https://issues.apache.org/jira/browse/ZOOKEEPER-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Camille Fournier updated ZOOKEEPER-844: --- Attachment: (was: ZOOKEEPER332-844) handle auth failure in java client -- Key: ZOOKEEPER-844 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-844 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-844.patch ClientCnxn.java currently has the following code: if (replyHdr.getXid() == -4) { // -2 is the xid for AuthPacket // TODO: process AuthPacket here if (LOG.isDebugEnabled()) { LOG.debug(Got auth sessionid:0x + Long.toHexString(sessionId)); } return; } Auth failures appear to cause the server to disconnect but the client never gets a proper state change or notification that auth has failed, which makes handling this scenario very difficult as it causes the client to go into a loop of sending bad auth, getting disconnected, trying to reconnect, sending bad auth again, over and over. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: (ZOOKEEPER-844) handle auth failure in java client
Hi all, I would like to submit this patch into the 3.3 branch as well, since we are probably going to go into production with 3.3 and I'd rather not do a production release with a patched version of ZK if possible. I added a patch for this fix against the 3.3 branch to this ticket. Any idea of the odds of getting this in to the 3.3.2 release? Thanks, Camille -Original Message- From: Giridharan Kesavan (JIRA) [mailto:j...@apache.org] Sent: Tuesday, August 31, 2010 7:25 PM To: Fournier, Camille F. [Tech] Subject: [jira] Updated: (ZOOKEEPER-844) handle auth failure in java client [ https://issues.apache.org/jira/browse/ZOOKEEPER-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giridharan Kesavan updated ZOOKEEPER-844: - Status: Patch Available (was: Open) handle auth failure in java client -- Key: ZOOKEEPER-844 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-844 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.3.1 Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.4.0 Attachments: ZOOKEEPER-844.patch ClientCnxn.java currently has the following code: if (replyHdr.getXid() == -4) { // -2 is the xid for AuthPacket // TODO: process AuthPacket here if (LOG.isDebugEnabled()) { LOG.debug(Got auth sessionid:0x + Long.toHexString(sessionId)); } return; } Auth failures appear to cause the server to disconnect but the client never gets a proper state change or notification that auth has failed, which makes handling this scenario very difficult as it causes the client to go into a loop of sending bad auth, getting disconnected, trying to reconnect, sending bad auth again, over and over. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: windows port of C API
I would be very interested to see any work already done and provide feedback, we need such a port and were planning on writing one ourselves. C -Original Message- From: Ben Collins [mailto:ben.coll...@foundationdb.com] Sent: Monday, August 30, 2010 5:01 PM To: zookeeper-dev@hadoop.apache.org Subject: windows port of C API I have a working win32 port of the C API, not depending on Cygwin, that supports the single-threaded model of network interaction. It compiles in Visual Studio 2010 and works on 64 bit Windows 7. There are know issues, and it is in it's initial stages; but it has been successfully used against the java server. I am happy to provide patches, but would like any pointers to efforts already undertaken in this area, or folks to communicate with about this. Thanks, -- Ben
handling auth failure in java client
Hi all, I filed this ticket last week: https://issues.apache.org/jira/browse/ZOOKEEPER-844 Currently, the Java client ignores auth failures which is extremely problematic for the deployment I am preparing. I have written a patch to correct the problem by adding an AuthFailed KeeperState and checking the auth responses for the AUTHFAILED error code (patch is now attached to the ticket). I checked the flow vs the c client and it seems to basically match. Is there anything I should be aware of beyond this simple fix? All the testing I've done seems fine. Thanks, Camille