[jira] Updated: (ZOOKEEPER-906) Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-906: --- Fix Version/s: 3.4.0 > Improve C client connection reliability by making it sleep between reconnect > attempts as in Java Client > --- > > Key: ZOOKEEPER-906 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-906 > Project: Zookeeper > Issue Type: Improvement > Components: c client >Affects Versions: 3.3.1 >Reporter: Radu Marin >Assignee: Radu Marin > Fix For: 3.4.0 > > Attachments: ZOOKEEPER.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, when a C client get disconnected, it retries a couple of hosts > (not all) with no delay between attempts and then if it doesn't succeed it > sleeps for 1/3 session expiration timeout period before trying again. > In the worst case the disconnect event can occur after 2/3 of session > expiration timeout has past, and sleeping for even more 1/3 session timeout > will cause a session loss in most of the times. > A better approach is to check all hosts but with random delay between > reconnect attempts. Also the delay must be independent of session timeout so > if we increase the session timeout we also increase the number of available > attempts. > This improvement covers the case when the C client experiences network > problems for a short period of time and is not able to reach any zookeeper > hosts. > Java client already uses this logic and works very good. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-906) Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-906: -- Assignee: Radu Marin > Improve C client connection reliability by making it sleep between reconnect > attempts as in Java Client > --- > > Key: ZOOKEEPER-906 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-906 > Project: Zookeeper > Issue Type: Improvement > Components: c client >Affects Versions: 3.3.1 >Reporter: Radu Marin >Assignee: Radu Marin > Attachments: ZOOKEEPER.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, when a C client get disconnected, it retries a couple of hosts > (not all) with no delay between attempts and then if it doesn't succeed it > sleeps for 1/3 session expiration timeout period before trying again. > In the worst case the disconnect event can occur after 2/3 of session > expiration timeout has past, and sleeping for even more 1/3 session timeout > will cause a session loss in most of the times. > A better approach is to check all hosts but with random delay between > reconnect attempts. Also the delay must be independent of session timeout so > if we increase the session timeout we also increase the number of available > attempts. > This improvement covers the case when the C client experiences network > problems for a short period of time and is not able to reach any zookeeper > hosts. > Java client already uses this logic and works very good. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-906) Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922823#action_12922823 ] Radu Marin commented on ZOOKEEPER-906: -- C client will NOW sleep for a random period (0 - 1000ms) between consecutive reconnect attempts. I will also check all hosts no matter what index has the server that is currently connected to. The random delay is independent of session expiration timeout (previously it was 1/3 of session expiration timeout) so increasing timeout will give the client more attempts to reconnect on connection loss before session expires. > Improve C client connection reliability by making it sleep between reconnect > attempts as in Java Client > --- > > Key: ZOOKEEPER-906 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-906 > Project: Zookeeper > Issue Type: Improvement > Components: c client >Affects Versions: 3.3.1 >Reporter: Radu Marin > Attachments: ZOOKEEPER.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, when a C client get disconnected, it retries a couple of hosts > (not all) with no delay between attempts and then if it doesn't succeed it > sleeps for 1/3 session expiration timeout period before trying again. > In the worst case the disconnect event can occur after 2/3 of session > expiration timeout has past, and sleeping for even more 1/3 session timeout > will cause a session loss in most of the times. > A better approach is to check all hosts but with random delay between > reconnect attempts. Also the delay must be independent of session timeout so > if we increase the session timeout we also increase the number of available > attempts. > This improvement covers the case when the C client experiences network > problems for a short period of time and is not able to reach any zookeeper > hosts. > Java client already uses this logic and works very good. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-906) Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radu Marin updated ZOOKEEPER-906: - Attachment: ZOOKEEPER.patch Attached a patch to fix this task. C client will not sleep for a random period (0 - 1000ms) between consecutive reconnect attempts. I will also check all hosts no matter what index has the server is currently connected to. The random delay is independent of session expiration timeout (previously it was 1/3) so increasing timeout will give more attempts to reconnect on connection loss before session expires. > Improve C client connection reliability by making it sleep between reconnect > attempts as in Java Client > --- > > Key: ZOOKEEPER-906 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-906 > Project: Zookeeper > Issue Type: Improvement > Components: c client >Affects Versions: 3.3.1 >Reporter: Radu Marin > Attachments: ZOOKEEPER.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, when a C client get disconnected, it retries a couple of hosts > (not all) with no delay between attempts and then if it doesn't succeed it > sleeps for 1/3 session expiration timeout period before trying again. > In the worst case the disconnect event can occur after 2/3 of session > expiration timeout has past, and sleeping for even more 1/3 session timeout > will cause a session loss in most of the times. > A better approach is to check all hosts but with random delay between > reconnect attempts. Also the delay must be independent of session timeout so > if we increase the session timeout we also increase the number of available > attempts. > This improvement covers the case when the C client experiences network > problems for a short period of time and is not able to reach any zookeeper > hosts. > Java client already uses this logic and works very good. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-906) Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client
Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client --- Key: ZOOKEEPER-906 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-906 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Radu Marin Currently, when a C client get disconnected, it retries a couple of hosts (not all) with no delay between attempts and then if it doesn't succeed it sleeps for 1/3 session expiration timeout period before trying again. In the worst case the disconnect event can occur after 2/3 of session expiration timeout has past, and sleeping for even more 1/3 session timeout will cause a session loss in most of the times. A better approach is to check all hosts but with random delay between reconnect attempts. Also the delay must be independent of session timeout so if we increase the session timeout we also increase the number of available attempts. This improvement covers the case when the C client experiences network problems for a short period of time and is not able to reach any zookeeper hosts. Java client already uses this logic and works very good. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-835) Refactoring Zookeeper Client Code
[ https://issues.apache.org/jira/browse/ZOOKEEPER-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922813#action_12922813 ] Benjamin Reed commented on ZOOKEEPER-835: - how do you see any of these things as related to ZOOKEEPER-22? > Refactoring Zookeeper Client Code > - > > Key: ZOOKEEPER-835 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-835 > Project: Zookeeper > Issue Type: Improvement > Components: java client >Affects Versions: 3.3.1 >Reporter: Patrick Datko >Assignee: Thomas Koch > > Thomas Koch asked me to fill individual issues for the points raised in his > mail to zookeeper-dev: > [Mail of Thomas Koch| > http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3c20100845.17507.tho...@koch.ro%3e > ] > He published several issues, which are present in the current zookeeper > client, so a refactoring of the code would be an facility for other > developers working with zookeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abmar Barros updated ZOOKEEPER-702: --- Status: Patch Available (was: Open) > GSoC 2010: Failure Detector Model > - > > Key: ZOOKEEPER-702 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 > Project: Zookeeper > Issue Type: Wish >Reporter: Henry Robinson >Assignee: Abmar Barros > Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, > chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, > ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, > ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, > ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, > ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, > ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, > ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch > > > Failure Detector Module > Possible Mentor > Henry Robinson (henry at apache dot org) > Requirements > Java, some distributed systems knowledge, comfort implementing distributed > systems protocols > Description > ZooKeeper servers detects the failure of other servers and clients by > counting the number of 'ticks' for which it doesn't get a heartbeat from > other machines. This is the 'timeout' method of failure detection and works > very well; however it is possible that it is too aggressive and not easily > tuned for some more unusual ZooKeeper installations (such as in a wide-area > network, or even in a mobile ad-hoc network). > This project would abstract the notion of failure detection to a dedicated > Java module, and implement several failure detectors to compare and contrast > their appropriateness for ZooKeeper. For example, Apache Cassandra uses a > phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which > is much more tunable and has some very interesting properties. This is a > great project if you are interested in distributed algorithms, or want to > help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abmar Barros updated ZOOKEEPER-702: --- Attachment: ZOOKEEPER-702.patch After making some more experiments with the Phi Accrual, I have noticed that the exponential distribution fits the ping inter-arrival sampling window better. Then, I have added a new option for the PhiAccrual called 'dist', that is the distribution used to model the inter-arrivals. Two possible values for this parameter are 'norm' and 'exp', and the default is 'exp'. When we set the PhiAccrual to use the exponential distribution, it will work similar to the Cassandra's failure detector. > GSoC 2010: Failure Detector Model > - > > Key: ZOOKEEPER-702 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 > Project: Zookeeper > Issue Type: Wish >Reporter: Henry Robinson >Assignee: Abmar Barros > Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, > chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, > ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, > ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, > ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, > ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, > ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, > ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch > > > Failure Detector Module > Possible Mentor > Henry Robinson (henry at apache dot org) > Requirements > Java, some distributed systems knowledge, comfort implementing distributed > systems protocols > Description > ZooKeeper servers detects the failure of other servers and clients by > counting the number of 'ticks' for which it doesn't get a heartbeat from > other machines. This is the 'timeout' method of failure detection and works > very well; however it is possible that it is too aggressive and not easily > tuned for some more unusual ZooKeeper installations (such as in a wide-area > network, or even in a mobile ad-hoc network). > This project would abstract the notion of failure detection to a dedicated > Java module, and implement several failure detectors to compare and contrast > their appropriateness for ZooKeeper. For example, Apache Cassandra uses a > phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which > is much more tunable and has some very interesting properties. This is a > great project if you are interested in distributed algorithms, or want to > help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-893: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1 Great work, thanks! > ZooKeeper high cpu usage when invalid requests > -- > > Key: ZOOKEEPER-893 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 > Project: Zookeeper > Issue Type: Bug > Components: server >Affects Versions: 3.3.1 > Environment: Linux 2.6.16 > 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz > java version "1.6.0_17" > Java(TM) SE Runtime Environment (build 1.6.0_17-b04) > Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) >Reporter: Thijs Terlouw >Assignee: Thijs Terlouw >Priority: Critical > Fix For: 3.3.2, 3.4.0 > > Attachments: ZOOKEEPER-893-3.3.patch, ZOOKEEPER-893.patch, > ZOOKEEPER-893.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When ZooKeeper receives certain illegally formed messages on the internal > communication port (:4181 by default), it's possible for ZooKeeper to enter > an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, > but that patch does not resolve all issues. > from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java > the two affected parts: > === > int length = msgLength.getInt(); > > if(length <= 0) { > > throw new IOException("Invalid packet length:" + length); > > } > === > === > while (message.hasRemaining()) { > > temp_numbytes = channel.read(message); > > if(temp_numbytes < 0) { > > throw new IOException("Channel eof before end"); > > } > > numbytes += temp_numbytes; > > } > === > how to replicate this bug: > perform an nmap portscan against your zookeeper server: "nmap -sV -n > your.ip.here -p4181" > wait for a while untill you see some messages in the logfile and then you > will see 100% cpu usage. It does not recover from this situation. With my > patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Harteau updated ZOOKEEPER-905: --- Affects Version/s: (was: 3.3.1) Release Note: hm, is it easier to attach a patch here? Status: Patch Available (was: Open) patch against zkserver...@r1024408 > enhance zkServer.sh for easier zookeeper automation-izing > - > > Key: ZOOKEEPER-905 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905 > Project: Zookeeper > Issue Type: Improvement > Components: scripts >Reporter: Nicholas Harteau >Priority: Minor > Attachments: zkServer.sh.diff > > > zkServer.sh is good at starting zookeeper and figuring out the right options > to pass along. > unfortunately if you want to wrap zookeeper startup/shutdown in any > significant way, you have to reimplement a bunch of the logic there. > the attached patch addresses a couple simple issues: > 1. add a 'start-foreground' option to zkServer.sh - this allows things that > expect to manage a foregrounded process (daemontools, launchd, etc) to use > zkServer.sh instead of rolling their own to launch zookeeper > 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper > from the script, just give me the command you'd normally use to exec > zookeeper. I found this useful when writing automation to start/stop > zookeeper as part of smoke testing zookeeper-based applications > 3. Deal more gracefully with supplying alternate configuration files to > zookeeper - currently the script assumes all config files reside in > $ZOOCFGDIR - also useful for smoke testing > 4. communicate extra info ("JMX enabled") about zookeeper on STDERR rather > than STDOUT (necessary for #2) > 5. fixes an issue on macos where readlink doesn't have the '-f' option. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Harteau updated ZOOKEEPER-905: --- Attachment: zkServer.sh.diff patch to bin/zkserver...@r1024408 > enhance zkServer.sh for easier zookeeper automation-izing > - > > Key: ZOOKEEPER-905 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905 > Project: Zookeeper > Issue Type: Improvement > Components: scripts >Affects Versions: 3.3.1 >Reporter: Nicholas Harteau >Priority: Minor > Attachments: zkServer.sh.diff > > > zkServer.sh is good at starting zookeeper and figuring out the right options > to pass along. > unfortunately if you want to wrap zookeeper startup/shutdown in any > significant way, you have to reimplement a bunch of the logic there. > the attached patch addresses a couple simple issues: > 1. add a 'start-foreground' option to zkServer.sh - this allows things that > expect to manage a foregrounded process (daemontools, launchd, etc) to use > zkServer.sh instead of rolling their own to launch zookeeper > 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper > from the script, just give me the command you'd normally use to exec > zookeeper. I found this useful when writing automation to start/stop > zookeeper as part of smoke testing zookeeper-based applications > 3. Deal more gracefully with supplying alternate configuration files to > zookeeper - currently the script assumes all config files reside in > $ZOOCFGDIR - also useful for smoke testing > 4. communicate extra info ("JMX enabled") about zookeeper on STDERR rather > than STDOUT (necessary for #2) > 5. fixes an issue on macos where readlink doesn't have the '-f' option. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing
enhance zkServer.sh for easier zookeeper automation-izing - Key: ZOOKEEPER-905 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905 Project: Zookeeper Issue Type: Improvement Components: scripts Affects Versions: 3.3.1 Reporter: Nicholas Harteau Priority: Minor Attachments: zkServer.sh.diff zkServer.sh is good at starting zookeeper and figuring out the right options to pass along. unfortunately if you want to wrap zookeeper startup/shutdown in any significant way, you have to reimplement a bunch of the logic there. the attached patch addresses a couple simple issues: 1. add a 'start-foreground' option to zkServer.sh - this allows things that expect to manage a foregrounded process (daemontools, launchd, etc) to use zkServer.sh instead of rolling their own to launch zookeeper 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper from the script, just give me the command you'd normally use to exec zookeeper. I found this useful when writing automation to start/stop zookeeper as part of smoke testing zookeeper-based applications 3. Deal more gracefully with supplying alternate configuration files to zookeeper - currently the script assumes all config files reside in $ZOOCFGDIR - also useful for smoke testing 4. communicate extra info ("JMX enabled") about zookeeper on STDERR rather than STDOUT (necessary for #2) 5. fixes an issue on macos where readlink doesn't have the '-f' option. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-904) super digest is not actually acting as a full superuser
super digest is not actually acting as a full superuser --- Key: ZOOKEEPER-904 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-904 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Camille Fournier The documentation states: New in 3.2: Enables a ZooKeeper ensemble administrator to access the znode hierarchy as a "super" user. In particular no ACL checking occurs for a user authenticated as super. However, if a super user does something like: zk.setACL("/", Ids.READ_ACL_UNSAFE, -1); the super user is now bound by read-only ACL. This is not what I would expect to see given the documentation. It can be fixed by moving the chec for the "super" authId in PrepRequestProcessor.checkACL to before the for(ACL a : acl) loop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-888: - Resolution: Fixed Status: Resolved (was: Patch Available) > c-client / zkpython: Double free corruption on node watcher > --- > > Key: ZOOKEEPER-888 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 > Project: Zookeeper > Issue Type: Bug > Components: c client, contrib-bindings >Affects Versions: 3.3.1 >Reporter: Lukas >Assignee: Lukas >Priority: Critical > Fix For: 3.3.2, 3.4.0 > > Attachments: resume-segfault.py, ZOOKEEPER-888-3.3.patch, > ZOOKEEPER-888.patch > > > the c-client / zkpython wrapper invokes already freed watcher callback > steps to reproduce: > 0. start a zookeper server on your machine > 1. run the attached python script > 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > 3. wait until the connection and the node observer fired with a session > event > 4. resume the zookeeper server process (e.g. using `pkill -CONT -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > -> the client tries to dispatch the node observer function again, but it was > already freed -> double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-888: - Hadoop Flags: [Reviewed] I just committed this to origin/branch-3.3 and origin/trunk. Thanks both! > c-client / zkpython: Double free corruption on node watcher > --- > > Key: ZOOKEEPER-888 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 > Project: Zookeeper > Issue Type: Bug > Components: c client, contrib-bindings >Affects Versions: 3.3.1 >Reporter: Lukas >Assignee: Lukas >Priority: Critical > Fix For: 3.3.2, 3.4.0 > > Attachments: resume-segfault.py, ZOOKEEPER-888-3.3.patch, > ZOOKEEPER-888.patch > > > the c-client / zkpython wrapper invokes already freed watcher callback > steps to reproduce: > 0. start a zookeper server on your machine > 1. run the attached python script > 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > 3. wait until the connection and the node observer fired with a session > event > 4. resume the zookeeper server process (e.g. using `pkill -CONT -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > -> the client tries to dispatch the node observer function again, but it was > already freed -> double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
implications of netty on client connections
Hi everyone, I'm curious what the implications of using netty are going to be for the case where a server gets close to its max available file descriptors. Right now our somewhat limited testing has shown that a ZK server performs fine up to the point when it runs out of available fds, at which point performance degrades sharply and new connections get into a somewhat bad state. Is netty going to enable the server to handle this situation more gracefully (or is there a way to do this already that I haven't found)? Limiting connections from the same client is not enough since we can potentially have far more clients wanting to connect than available fds for certain use cases we might consider. Thanks, Camille
[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-893: --- Status: Patch Available (was: Open) > ZooKeeper high cpu usage when invalid requests > -- > > Key: ZOOKEEPER-893 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 > Project: Zookeeper > Issue Type: Bug > Components: server >Affects Versions: 3.3.1 > Environment: Linux 2.6.16 > 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz > java version "1.6.0_17" > Java(TM) SE Runtime Environment (build 1.6.0_17-b04) > Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) >Reporter: Thijs Terlouw >Assignee: Thijs Terlouw >Priority: Critical > Fix For: 3.3.2, 3.4.0 > > Attachments: ZOOKEEPER-893-3.3.patch, ZOOKEEPER-893.patch, > ZOOKEEPER-893.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When ZooKeeper receives certain illegally formed messages on the internal > communication port (:4181 by default), it's possible for ZooKeeper to enter > an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, > but that patch does not resolve all issues. > from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java > the two affected parts: > === > int length = msgLength.getInt(); > > if(length <= 0) { > > throw new IOException("Invalid packet length:" + length); > > } > === > === > while (message.hasRemaining()) { > > temp_numbytes = channel.read(message); > > if(temp_numbytes < 0) { > > throw new IOException("Channel eof before end"); > > } > > numbytes += temp_numbytes; > > } > === > how to replicate this bug: > perform an nmap portscan against your zookeeper server: "nmap -sV -n > your.ip.here -p4181" > wait for a while untill you see some messages in the logfile and then you > will see 100% cpu usage. It does not recover from this situation. With my > patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-893: --- Attachment: ZOOKEEPER-893-3.3.patch Thanks, Thijs. Adding 3.3 patch. > ZooKeeper high cpu usage when invalid requests > -- > > Key: ZOOKEEPER-893 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 > Project: Zookeeper > Issue Type: Bug > Components: server >Affects Versions: 3.3.1 > Environment: Linux 2.6.16 > 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz > java version "1.6.0_17" > Java(TM) SE Runtime Environment (build 1.6.0_17-b04) > Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) >Reporter: Thijs Terlouw >Assignee: Thijs Terlouw >Priority: Critical > Fix For: 3.3.2, 3.4.0 > > Attachments: ZOOKEEPER-893-3.3.patch, ZOOKEEPER-893.patch, > ZOOKEEPER-893.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When ZooKeeper receives certain illegally formed messages on the internal > communication port (:4181 by default), it's possible for ZooKeeper to enter > an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, > but that patch does not resolve all issues. > from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java > the two affected parts: > === > int length = msgLength.getInt(); > > if(length <= 0) { > > throw new IOException("Invalid packet length:" + length); > > } > === > === > while (message.hasRemaining()) { > > temp_numbytes = channel.read(message); > > if(temp_numbytes < 0) { > > throw new IOException("Channel eof before end"); > > } > > numbytes += temp_numbytes; > > } > === > how to replicate this bug: > perform an nmap portscan against your zookeeper server: "nmap -sV -n > your.ip.here -p4181" > wait for a while untill you see some messages in the logfile and then you > will see 100% cpu usage. It does not recover from this situation. With my > patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922539#action_12922539 ] Thijs Terlouw commented on ZOOKEEPER-893: - Thanks Flavio! I have been too busy to add a testcase and yours looks great! > ZooKeeper high cpu usage when invalid requests > -- > > Key: ZOOKEEPER-893 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 > Project: Zookeeper > Issue Type: Bug > Components: server >Affects Versions: 3.3.1 > Environment: Linux 2.6.16 > 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz > java version "1.6.0_17" > Java(TM) SE Runtime Environment (build 1.6.0_17-b04) > Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) >Reporter: Thijs Terlouw >Assignee: Thijs Terlouw >Priority: Critical > Fix For: 3.3.2, 3.4.0 > > Attachments: ZOOKEEPER-893.patch, ZOOKEEPER-893.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When ZooKeeper receives certain illegally formed messages on the internal > communication port (:4181 by default), it's possible for ZooKeeper to enter > an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, > but that patch does not resolve all issues. > from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java > the two affected parts: > === > int length = msgLength.getInt(); > > if(length <= 0) { > > throw new IOException("Invalid packet length:" + length); > > } > === > === > while (message.hasRemaining()) { > > temp_numbytes = channel.read(message); > > if(temp_numbytes < 0) { > > throw new IOException("Channel eof before end"); > > } > > numbytes += temp_numbytes; > > } > === > how to replicate this bug: > perform an nmap portscan against your zookeeper server: "nmap -sV -n > your.ip.here -p4181" > wait for a while untill you see some messages in the logfile and then you > will see 100% cpu usage. It does not recover from this situation. With my > patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-893: --- Attachment: ZOOKEEPER-893.patch Adding a test and removing an if statement that became unnecessary with this patch from RecvWorker.run(). I'll be adding a patch for the 3.3 branch shortly. > ZooKeeper high cpu usage when invalid requests > -- > > Key: ZOOKEEPER-893 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 > Project: Zookeeper > Issue Type: Bug > Components: server >Affects Versions: 3.3.1 > Environment: Linux 2.6.16 > 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz > java version "1.6.0_17" > Java(TM) SE Runtime Environment (build 1.6.0_17-b04) > Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) >Reporter: Thijs Terlouw >Assignee: Thijs Terlouw >Priority: Critical > Fix For: 3.3.2, 3.4.0 > > Attachments: ZOOKEEPER-893.patch, ZOOKEEPER-893.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When ZooKeeper receives certain illegally formed messages on the internal > communication port (:4181 by default), it's possible for ZooKeeper to enter > an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, > but that patch does not resolve all issues. > from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java > the two affected parts: > === > int length = msgLength.getInt(); > > if(length <= 0) { > > throw new IOException("Invalid packet length:" + length); > > } > === > === > while (message.hasRemaining()) { > > temp_numbytes = channel.read(message); > > if(temp_numbytes < 0) { > > throw new IOException("Channel eof before end"); > > } > > numbytes += temp_numbytes; > > } > === > how to replicate this bug: > perform an nmap portscan against your zookeeper server: "nmap -sV -n > your.ip.here -p4181" > wait for a while untill you see some messages in the logfile and then you > will see 100% cpu usage. It does not recover from this situation. With my > patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-893: --- Status: Open (was: Patch Available) Missing a test. > ZooKeeper high cpu usage when invalid requests > -- > > Key: ZOOKEEPER-893 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 > Project: Zookeeper > Issue Type: Bug > Components: server >Affects Versions: 3.3.1 > Environment: Linux 2.6.16 > 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz > java version "1.6.0_17" > Java(TM) SE Runtime Environment (build 1.6.0_17-b04) > Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) >Reporter: Thijs Terlouw >Assignee: Thijs Terlouw >Priority: Critical > Fix For: 3.3.2, 3.4.0 > > Attachments: ZOOKEEPER-893.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When ZooKeeper receives certain illegally formed messages on the internal > communication port (:4181 by default), it's possible for ZooKeeper to enter > an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, > but that patch does not resolve all issues. > from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java > the two affected parts: > === > int length = msgLength.getInt(); > > if(length <= 0) { > > throw new IOException("Invalid packet length:" + length); > > } > === > === > while (message.hasRemaining()) { > > temp_numbytes = channel.read(message); > > if(temp_numbytes < 0) { > > throw new IOException("Channel eof before end"); > > } > > numbytes += temp_numbytes; > > } > === > how to replicate this bug: > perform an nmap portscan against your zookeeper server: "nmap -sV -n > your.ip.here -p4181" > wait for a while untill you see some messages in the logfile and then you > will see 100% cpu usage. It does not recover from this situation. With my > patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-855) clientPortBindAddress should be clientPortAddress
[ https://issues.apache.org/jira/browse/ZOOKEEPER-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922512#action_12922512 ] Hudson commented on ZOOKEEPER-855: -- Integrated in ZooKeeper-trunk #971 (See [https://hudson.apache.org/hudson/job/ZooKeeper-trunk/971/]) ZOOKEEPER-855. clientPortBindAddress should be clientPortAddress (Jared Cantwell via fpj) > clientPortBindAddress should be clientPortAddress > - > > Key: ZOOKEEPER-855 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-855 > Project: Zookeeper > Issue Type: Bug > Components: documentation >Affects Versions: 3.3.0, 3.3.1 >Reporter: Jared Cantwell >Assignee: Jared Cantwell >Priority: Trivial > Fix For: 3.3.2, 3.4.0 > > Attachments: ZOOKEEPER-855.patch, ZOOKEEPER-855.patch > > > The server documentation states that the configuration parameter for binding > to a specific ip address is clientPortBindAddress. The code believes the > parameter is clientPortAddress. The documentation for 3.3.X versions needs > changed to reflect the correct parameter . This parameter was added in > ZOOKEEPER-635. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.