[jira] Updated: (ZOOKEEPER-906) Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client

2010-10-19 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-906:
---

Fix Version/s: 3.4.0

> Improve C client connection reliability by making it sleep between reconnect 
> attempts as in Java Client
> ---
>
> Key: ZOOKEEPER-906
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-906
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.3.1
>Reporter: Radu Marin
>Assignee: Radu Marin
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, when a C client get disconnected, it retries a couple of hosts 
> (not all) with no delay between attempts and then if it doesn't succeed it 
> sleeps for 1/3 session expiration timeout period before trying again.
> In the worst case the disconnect event can occur after 2/3 of session 
> expiration timeout has past, and sleeping for even more 1/3 session timeout 
> will cause a session loss in most of the times.
> A better approach is to check all hosts but with random delay between 
> reconnect attempts. Also the delay must be independent of session timeout so 
> if we increase the session timeout we also increase the number of available 
> attempts.
> This improvement covers the case when the C client experiences network 
> problems for a short period of time and is not able to reach any zookeeper 
> hosts.
> Java client already uses this logic and works very good.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-906) Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client

2010-10-19 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-906:
--

Assignee: Radu Marin

> Improve C client connection reliability by making it sleep between reconnect 
> attempts as in Java Client
> ---
>
> Key: ZOOKEEPER-906
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-906
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.3.1
>Reporter: Radu Marin
>Assignee: Radu Marin
> Attachments: ZOOKEEPER.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, when a C client get disconnected, it retries a couple of hosts 
> (not all) with no delay between attempts and then if it doesn't succeed it 
> sleeps for 1/3 session expiration timeout period before trying again.
> In the worst case the disconnect event can occur after 2/3 of session 
> expiration timeout has past, and sleeping for even more 1/3 session timeout 
> will cause a session loss in most of the times.
> A better approach is to check all hosts but with random delay between 
> reconnect attempts. Also the delay must be independent of session timeout so 
> if we increase the session timeout we also increase the number of available 
> attempts.
> This improvement covers the case when the C client experiences network 
> problems for a short period of time and is not able to reach any zookeeper 
> hosts.
> Java client already uses this logic and works very good.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-906) Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client

2010-10-19 Thread Radu Marin (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922823#action_12922823
 ] 

Radu Marin commented on ZOOKEEPER-906:
--

C client will NOW sleep for a random period (0 - 1000ms) between consecutive 
reconnect attempts.
I will also check all hosts no matter what index has the server that is 
currently connected to.
The random delay is independent of session expiration timeout (previously it 
was 1/3 of session expiration timeout) so increasing timeout will give the 
client more attempts to reconnect on connection loss before session expires. 

> Improve C client connection reliability by making it sleep between reconnect 
> attempts as in Java Client
> ---
>
> Key: ZOOKEEPER-906
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-906
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.3.1
>Reporter: Radu Marin
> Attachments: ZOOKEEPER.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, when a C client get disconnected, it retries a couple of hosts 
> (not all) with no delay between attempts and then if it doesn't succeed it 
> sleeps for 1/3 session expiration timeout period before trying again.
> In the worst case the disconnect event can occur after 2/3 of session 
> expiration timeout has past, and sleeping for even more 1/3 session timeout 
> will cause a session loss in most of the times.
> A better approach is to check all hosts but with random delay between 
> reconnect attempts. Also the delay must be independent of session timeout so 
> if we increase the session timeout we also increase the number of available 
> attempts.
> This improvement covers the case when the C client experiences network 
> problems for a short period of time and is not able to reach any zookeeper 
> hosts.
> Java client already uses this logic and works very good.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-906) Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client

2010-10-19 Thread Radu Marin (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Radu Marin updated ZOOKEEPER-906:
-

Attachment: ZOOKEEPER.patch

Attached a patch to fix this task.
C client will not sleep for a random period (0 - 1000ms) between consecutive 
reconnect attempts.
I will also check all hosts no matter what index has the server is currently 
connected to.
The random delay is independent of session expiration timeout (previously it 
was 1/3) so increasing timeout will give more attempts  to reconnect on 
connection loss before session expires.

> Improve C client connection reliability by making it sleep between reconnect 
> attempts as in Java Client
> ---
>
> Key: ZOOKEEPER-906
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-906
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.3.1
>Reporter: Radu Marin
> Attachments: ZOOKEEPER.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, when a C client get disconnected, it retries a couple of hosts 
> (not all) with no delay between attempts and then if it doesn't succeed it 
> sleeps for 1/3 session expiration timeout period before trying again.
> In the worst case the disconnect event can occur after 2/3 of session 
> expiration timeout has past, and sleeping for even more 1/3 session timeout 
> will cause a session loss in most of the times.
> A better approach is to check all hosts but with random delay between 
> reconnect attempts. Also the delay must be independent of session timeout so 
> if we increase the session timeout we also increase the number of available 
> attempts.
> This improvement covers the case when the C client experiences network 
> problems for a short period of time and is not able to reach any zookeeper 
> hosts.
> Java client already uses this logic and works very good.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-906) Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client

2010-10-19 Thread Radu Marin (JIRA)
Improve C client connection reliability by making it sleep between reconnect 
attempts as in Java Client
---

 Key: ZOOKEEPER-906
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-906
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
Reporter: Radu Marin


Currently, when a C client get disconnected, it retries a couple of hosts (not 
all) with no delay between attempts and then if it doesn't succeed it sleeps 
for 1/3 session expiration timeout period before trying again.
In the worst case the disconnect event can occur after 2/3 of session 
expiration timeout has past, and sleeping for even more 1/3 session timeout 
will cause a session loss in most of the times.

A better approach is to check all hosts but with random delay between reconnect 
attempts. Also the delay must be independent of session timeout so if we 
increase the session timeout we also increase the number of available attempts.

This improvement covers the case when the C client experiences network problems 
for a short period of time and is not able to reach any zookeeper hosts.
Java client already uses this logic and works very good.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-835) Refactoring Zookeeper Client Code

2010-10-19 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922813#action_12922813
 ] 

Benjamin Reed commented on ZOOKEEPER-835:
-

how do you see any of these things as related to ZOOKEEPER-22?

> Refactoring Zookeeper Client Code
> -
>
> Key: ZOOKEEPER-835
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-835
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Patrick Datko
>Assignee: Thomas Koch
>
> Thomas Koch asked me to fill individual issues for the points raised in his 
> mail to zookeeper-dev:
> [Mail of Thomas Koch| 
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3c20100845.17507.tho...@koch.ro%3e
>  ]
> He published several issues, which are present in the current zookeeper 
> client, so a refactoring of the code would be an facility for other 
> developers working with zookeeper.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-10-19 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Status: Patch Available  (was: Open)

> GSoC 2010: Failure Detector Model
> -
>
> Key: ZOOKEEPER-702
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
> Project: Zookeeper
>  Issue Type: Wish
>Reporter: Henry Robinson
>Assignee: Abmar Barros
> Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
> chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
> ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch
>
>
> Failure Detector Module
> Possible Mentor
> Henry Robinson (henry at apache dot org)
> Requirements
> Java, some distributed systems knowledge, comfort implementing distributed 
> systems protocols
> Description
> ZooKeeper servers detects the failure of other servers and clients by 
> counting the number of 'ticks' for which it doesn't get a heartbeat from 
> other machines. This is the 'timeout' method of failure detection and works 
> very well; however it is possible that it is too aggressive and not easily 
> tuned for some more unusual ZooKeeper installations (such as in a wide-area 
> network, or even in a mobile ad-hoc network).
> This project would abstract the notion of failure detection to a dedicated 
> Java module, and implement several failure detectors to compare and contrast 
> their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
> phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
> is much more tunable and has some very interesting properties. This is a 
> great project if you are interested in distributed algorithms, or want to 
> help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-10-19 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

After making some more experiments with the Phi Accrual, I have noticed that 
the exponential distribution fits the ping inter-arrival sampling window 
better. 
Then, I have added a new option for the PhiAccrual called 'dist', that is the 
distribution used to model the inter-arrivals. 
Two possible values for this parameter are 'norm' and 'exp', and the default is 
'exp'. When we set the PhiAccrual to use the exponential distribution, it will 
work similar to the Cassandra's failure detector.


> GSoC 2010: Failure Detector Model
> -
>
> Key: ZOOKEEPER-702
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
> Project: Zookeeper
>  Issue Type: Wish
>Reporter: Henry Robinson
>Assignee: Abmar Barros
> Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
> chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
> ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch
>
>
> Failure Detector Module
> Possible Mentor
> Henry Robinson (henry at apache dot org)
> Requirements
> Java, some distributed systems knowledge, comfort implementing distributed 
> systems protocols
> Description
> ZooKeeper servers detects the failure of other servers and clients by 
> counting the number of 'ticks' for which it doesn't get a heartbeat from 
> other machines. This is the 'timeout' method of failure detection and works 
> very well; however it is possible that it is too aggressive and not easily 
> tuned for some more unusual ZooKeeper installations (such as in a wide-area 
> network, or even in a mobile ad-hoc network).
> This project would abstract the notion of failure detection to a dedicated 
> Java module, and implement several failure detectors to compare and contrast 
> their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
> phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
> is much more tunable and has some very interesting properties. This is a 
> great project if you are interested in distributed algorithms, or want to 
> help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests

2010-10-19 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-893:
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

+1 Great work, thanks!

> ZooKeeper high cpu usage when invalid requests
> --
>
> Key: ZOOKEEPER-893
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.1
> Environment: Linux 2.6.16
> 4x Intel(R) Xeon(R) CPU X3320  @ 2.50GHz
> java version "1.6.0_17"
> Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
> Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
>Reporter: Thijs Terlouw
>Assignee: Thijs Terlouw
>Priority: Critical
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-893-3.3.patch, ZOOKEEPER-893.patch, 
> ZOOKEEPER-893.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When ZooKeeper receives certain illegally formed messages on the internal 
> communication port (:4181 by default), it's possible for ZooKeeper to enter 
> an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, 
> but that patch does not resolve all issues.
> from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 
> the two affected parts:
> ===
> int length = msgLength.getInt();  
>   
> if(length <= 0) { 
>   
> throw new IOException("Invalid packet length:" + length); 
>   
> } 
> ===
> ===
> while (message.hasRemaining()) {  
>   
> temp_numbytes = channel.read(message);
>   
> if(temp_numbytes < 0) {   
>   
> throw new IOException("Channel eof before end");  
>   
> } 
>   
> numbytes += temp_numbytes;
>   
> } 
> ===
> how to replicate this bug:
> perform an nmap portscan against your zookeeper server: "nmap -sV -n 
> your.ip.here -p4181"
> wait for a while untill you see some messages in the logfile and then you 
> will see 100% cpu usage. It does not recover from this situation. With my 
> patch, it does not occur anymore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing

2010-10-19 Thread Nicholas Harteau (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Harteau updated ZOOKEEPER-905:
---

Affects Version/s: (was: 3.3.1)
 Release Note: hm, is it easier to attach a patch here?
   Status: Patch Available  (was: Open)

patch against zkserver...@r1024408

> enhance zkServer.sh for easier zookeeper automation-izing
> -
>
> Key: ZOOKEEPER-905
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: scripts
>Reporter: Nicholas Harteau
>Priority: Minor
> Attachments: zkServer.sh.diff
>
>
> zkServer.sh is good at starting zookeeper and figuring out the right options 
> to pass along.
> unfortunately if you want to wrap zookeeper startup/shutdown in any 
> significant way, you have to reimplement a bunch of the logic there.
> the attached patch addresses a couple simple issues:
> 1. add a 'start-foreground' option to zkServer.sh - this allows things that 
> expect to manage a foregrounded process (daemontools, launchd, etc) to use 
> zkServer.sh instead of rolling their own to launch zookeeper
> 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper 
> from the script, just give me the command you'd normally use to exec 
> zookeeper.  I found this useful when writing automation to start/stop 
> zookeeper as part of smoke testing zookeeper-based applications
> 3. Deal more gracefully with supplying alternate configuration files to 
> zookeeper - currently the script assumes all config files reside in 
> $ZOOCFGDIR - also useful for smoke testing
> 4. communicate extra info ("JMX enabled") about zookeeper on STDERR rather 
> than STDOUT (necessary for #2)
> 5. fixes an issue on macos where readlink doesn't have the '-f' option.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing

2010-10-19 Thread Nicholas Harteau (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Harteau updated ZOOKEEPER-905:
---

Attachment: zkServer.sh.diff

patch to bin/zkserver...@r1024408

> enhance zkServer.sh for easier zookeeper automation-izing
> -
>
> Key: ZOOKEEPER-905
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 3.3.1
>Reporter: Nicholas Harteau
>Priority: Minor
> Attachments: zkServer.sh.diff
>
>
> zkServer.sh is good at starting zookeeper and figuring out the right options 
> to pass along.
> unfortunately if you want to wrap zookeeper startup/shutdown in any 
> significant way, you have to reimplement a bunch of the logic there.
> the attached patch addresses a couple simple issues:
> 1. add a 'start-foreground' option to zkServer.sh - this allows things that 
> expect to manage a foregrounded process (daemontools, launchd, etc) to use 
> zkServer.sh instead of rolling their own to launch zookeeper
> 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper 
> from the script, just give me the command you'd normally use to exec 
> zookeeper.  I found this useful when writing automation to start/stop 
> zookeeper as part of smoke testing zookeeper-based applications
> 3. Deal more gracefully with supplying alternate configuration files to 
> zookeeper - currently the script assumes all config files reside in 
> $ZOOCFGDIR - also useful for smoke testing
> 4. communicate extra info ("JMX enabled") about zookeeper on STDERR rather 
> than STDOUT (necessary for #2)
> 5. fixes an issue on macos where readlink doesn't have the '-f' option.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing

2010-10-19 Thread Nicholas Harteau (JIRA)
enhance zkServer.sh for easier zookeeper automation-izing
-

 Key: ZOOKEEPER-905
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905
 Project: Zookeeper
  Issue Type: Improvement
  Components: scripts
Affects Versions: 3.3.1
Reporter: Nicholas Harteau
Priority: Minor
 Attachments: zkServer.sh.diff

zkServer.sh is good at starting zookeeper and figuring out the right options to 
pass along.

unfortunately if you want to wrap zookeeper startup/shutdown in any significant 
way, you have to reimplement a bunch of the logic there.

the attached patch addresses a couple simple issues:
1. add a 'start-foreground' option to zkServer.sh - this allows things that 
expect to manage a foregrounded process (daemontools, launchd, etc) to use 
zkServer.sh instead of rolling their own to launch zookeeper

2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper 
from the script, just give me the command you'd normally use to exec zookeeper. 
 I found this useful when writing automation to start/stop zookeeper as part of 
smoke testing zookeeper-based applications

3. Deal more gracefully with supplying alternate configuration files to 
zookeeper - currently the script assumes all config files reside in $ZOOCFGDIR 
- also useful for smoke testing

4. communicate extra info ("JMX enabled") about zookeeper on STDERR rather than 
STDOUT (necessary for #2)

5. fixes an issue on macos where readlink doesn't have the '-f' option.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-904) super digest is not actually acting as a full superuser

2010-10-19 Thread Camille Fournier (JIRA)
super digest is not actually acting as a full superuser
---

 Key: ZOOKEEPER-904
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-904
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
Reporter: Camille Fournier


The documentation states:
New in 3.2:  Enables a ZooKeeper ensemble administrator to access the znode 
hierarchy as a "super" user. In particular no ACL checking occurs for a user 
authenticated as super.

However, if a super user does something like:
zk.setACL("/", Ids.READ_ACL_UNSAFE, -1);

the super user is now bound by read-only ACL. This is not what I would expect 
to see given the documentation. It can be fixed by moving the chec for the 
"super" authId in PrepRequestProcessor.checkACL to before the for(ACL a : acl) 
loop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher

2010-10-19 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-888:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> c-client / zkpython: Double free corruption on node watcher
> ---
>
> Key: ZOOKEEPER-888
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Lukas
>Assignee: Lukas
>Priority: Critical
> Fix For: 3.3.2, 3.4.0
>
> Attachments: resume-segfault.py, ZOOKEEPER-888-3.3.patch, 
> ZOOKEEPER-888.patch
>
>
> the c-client / zkpython wrapper invokes already freed watcher callback
> steps to reproduce:
>   0. start a zookeper server on your machine
>   1. run the attached python script
>   2. suspend the zookeeper server process (e.g. using `pkill -STOP -f 
> org.apache.zookeeper.server.quorum.QuorumPeerMain` )
>   3. wait until the connection and the node observer fired with a session 
> event
>   4. resume the zookeeper server process  (e.g. using `pkill -CONT -f 
> org.apache.zookeeper.server.quorum.QuorumPeerMain` )
> -> the client tries to dispatch the node observer function again, but it was 
> already freed -> double free corruption

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher

2010-10-19 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-888:
-

Hadoop Flags: [Reviewed]

I just committed this to origin/branch-3.3 and origin/trunk. 

Thanks both!

> c-client / zkpython: Double free corruption on node watcher
> ---
>
> Key: ZOOKEEPER-888
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Lukas
>Assignee: Lukas
>Priority: Critical
> Fix For: 3.3.2, 3.4.0
>
> Attachments: resume-segfault.py, ZOOKEEPER-888-3.3.patch, 
> ZOOKEEPER-888.patch
>
>
> the c-client / zkpython wrapper invokes already freed watcher callback
> steps to reproduce:
>   0. start a zookeper server on your machine
>   1. run the attached python script
>   2. suspend the zookeeper server process (e.g. using `pkill -STOP -f 
> org.apache.zookeeper.server.quorum.QuorumPeerMain` )
>   3. wait until the connection and the node observer fired with a session 
> event
>   4. resume the zookeeper server process  (e.g. using `pkill -CONT -f 
> org.apache.zookeeper.server.quorum.QuorumPeerMain` )
> -> the client tries to dispatch the node observer function again, but it was 
> already freed -> double free corruption

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



implications of netty on client connections

2010-10-19 Thread Fournier, Camille F. [Tech]
Hi everyone,

I'm curious what the implications of using netty are going to be for the case 
where a server gets close to its max available file descriptors. Right now our 
somewhat limited testing has shown that a ZK server performs fine up to the 
point when it runs out of available fds, at which point performance degrades 
sharply and new connections get into a somewhat bad state. Is netty going to 
enable the server to handle this situation more gracefully (or is there a way 
to do this already that I haven't found)? Limiting connections from the same 
client is not enough since we can potentially have far more clients wanting to 
connect than available fds for certain use cases we might consider.

Thanks,
Camille



[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests

2010-10-19 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-893:
---

Status: Patch Available  (was: Open)

> ZooKeeper high cpu usage when invalid requests
> --
>
> Key: ZOOKEEPER-893
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.1
> Environment: Linux 2.6.16
> 4x Intel(R) Xeon(R) CPU X3320  @ 2.50GHz
> java version "1.6.0_17"
> Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
> Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
>Reporter: Thijs Terlouw
>Assignee: Thijs Terlouw
>Priority: Critical
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-893-3.3.patch, ZOOKEEPER-893.patch, 
> ZOOKEEPER-893.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When ZooKeeper receives certain illegally formed messages on the internal 
> communication port (:4181 by default), it's possible for ZooKeeper to enter 
> an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, 
> but that patch does not resolve all issues.
> from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 
> the two affected parts:
> ===
> int length = msgLength.getInt();  
>   
> if(length <= 0) { 
>   
> throw new IOException("Invalid packet length:" + length); 
>   
> } 
> ===
> ===
> while (message.hasRemaining()) {  
>   
> temp_numbytes = channel.read(message);
>   
> if(temp_numbytes < 0) {   
>   
> throw new IOException("Channel eof before end");  
>   
> } 
>   
> numbytes += temp_numbytes;
>   
> } 
> ===
> how to replicate this bug:
> perform an nmap portscan against your zookeeper server: "nmap -sV -n 
> your.ip.here -p4181"
> wait for a while untill you see some messages in the logfile and then you 
> will see 100% cpu usage. It does not recover from this situation. With my 
> patch, it does not occur anymore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests

2010-10-19 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-893:
---

Attachment: ZOOKEEPER-893-3.3.patch

Thanks, Thijs. Adding 3.3 patch. 

> ZooKeeper high cpu usage when invalid requests
> --
>
> Key: ZOOKEEPER-893
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.1
> Environment: Linux 2.6.16
> 4x Intel(R) Xeon(R) CPU X3320  @ 2.50GHz
> java version "1.6.0_17"
> Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
> Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
>Reporter: Thijs Terlouw
>Assignee: Thijs Terlouw
>Priority: Critical
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-893-3.3.patch, ZOOKEEPER-893.patch, 
> ZOOKEEPER-893.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When ZooKeeper receives certain illegally formed messages on the internal 
> communication port (:4181 by default), it's possible for ZooKeeper to enter 
> an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, 
> but that patch does not resolve all issues.
> from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 
> the two affected parts:
> ===
> int length = msgLength.getInt();  
>   
> if(length <= 0) { 
>   
> throw new IOException("Invalid packet length:" + length); 
>   
> } 
> ===
> ===
> while (message.hasRemaining()) {  
>   
> temp_numbytes = channel.read(message);
>   
> if(temp_numbytes < 0) {   
>   
> throw new IOException("Channel eof before end");  
>   
> } 
>   
> numbytes += temp_numbytes;
>   
> } 
> ===
> how to replicate this bug:
> perform an nmap portscan against your zookeeper server: "nmap -sV -n 
> your.ip.here -p4181"
> wait for a while untill you see some messages in the logfile and then you 
> will see 100% cpu usage. It does not recover from this situation. With my 
> patch, it does not occur anymore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests

2010-10-19 Thread Thijs Terlouw (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922539#action_12922539
 ] 

Thijs Terlouw commented on ZOOKEEPER-893:
-

Thanks Flavio! I have been too busy to add a testcase and yours looks great!

> ZooKeeper high cpu usage when invalid requests
> --
>
> Key: ZOOKEEPER-893
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.1
> Environment: Linux 2.6.16
> 4x Intel(R) Xeon(R) CPU X3320  @ 2.50GHz
> java version "1.6.0_17"
> Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
> Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
>Reporter: Thijs Terlouw
>Assignee: Thijs Terlouw
>Priority: Critical
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-893.patch, ZOOKEEPER-893.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When ZooKeeper receives certain illegally formed messages on the internal 
> communication port (:4181 by default), it's possible for ZooKeeper to enter 
> an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, 
> but that patch does not resolve all issues.
> from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 
> the two affected parts:
> ===
> int length = msgLength.getInt();  
>   
> if(length <= 0) { 
>   
> throw new IOException("Invalid packet length:" + length); 
>   
> } 
> ===
> ===
> while (message.hasRemaining()) {  
>   
> temp_numbytes = channel.read(message);
>   
> if(temp_numbytes < 0) {   
>   
> throw new IOException("Channel eof before end");  
>   
> } 
>   
> numbytes += temp_numbytes;
>   
> } 
> ===
> how to replicate this bug:
> perform an nmap portscan against your zookeeper server: "nmap -sV -n 
> your.ip.here -p4181"
> wait for a while untill you see some messages in the logfile and then you 
> will see 100% cpu usage. It does not recover from this situation. With my 
> patch, it does not occur anymore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests

2010-10-19 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-893:
---

Attachment: ZOOKEEPER-893.patch

Adding a test and removing an if statement that became unnecessary with this 
patch from RecvWorker.run(). I'll be adding a patch for the 3.3 branch shortly.

> ZooKeeper high cpu usage when invalid requests
> --
>
> Key: ZOOKEEPER-893
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.1
> Environment: Linux 2.6.16
> 4x Intel(R) Xeon(R) CPU X3320  @ 2.50GHz
> java version "1.6.0_17"
> Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
> Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
>Reporter: Thijs Terlouw
>Assignee: Thijs Terlouw
>Priority: Critical
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-893.patch, ZOOKEEPER-893.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When ZooKeeper receives certain illegally formed messages on the internal 
> communication port (:4181 by default), it's possible for ZooKeeper to enter 
> an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, 
> but that patch does not resolve all issues.
> from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 
> the two affected parts:
> ===
> int length = msgLength.getInt();  
>   
> if(length <= 0) { 
>   
> throw new IOException("Invalid packet length:" + length); 
>   
> } 
> ===
> ===
> while (message.hasRemaining()) {  
>   
> temp_numbytes = channel.read(message);
>   
> if(temp_numbytes < 0) {   
>   
> throw new IOException("Channel eof before end");  
>   
> } 
>   
> numbytes += temp_numbytes;
>   
> } 
> ===
> how to replicate this bug:
> perform an nmap portscan against your zookeeper server: "nmap -sV -n 
> your.ip.here -p4181"
> wait for a while untill you see some messages in the logfile and then you 
> will see 100% cpu usage. It does not recover from this situation. With my 
> patch, it does not occur anymore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests

2010-10-19 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-893:
---

Status: Open  (was: Patch Available)

Missing a test.

> ZooKeeper high cpu usage when invalid requests
> --
>
> Key: ZOOKEEPER-893
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.1
> Environment: Linux 2.6.16
> 4x Intel(R) Xeon(R) CPU X3320  @ 2.50GHz
> java version "1.6.0_17"
> Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
> Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
>Reporter: Thijs Terlouw
>Assignee: Thijs Terlouw
>Priority: Critical
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-893.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When ZooKeeper receives certain illegally formed messages on the internal 
> communication port (:4181 by default), it's possible for ZooKeeper to enter 
> an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, 
> but that patch does not resolve all issues.
> from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 
> the two affected parts:
> ===
> int length = msgLength.getInt();  
>   
> if(length <= 0) { 
>   
> throw new IOException("Invalid packet length:" + length); 
>   
> } 
> ===
> ===
> while (message.hasRemaining()) {  
>   
> temp_numbytes = channel.read(message);
>   
> if(temp_numbytes < 0) {   
>   
> throw new IOException("Channel eof before end");  
>   
> } 
>   
> numbytes += temp_numbytes;
>   
> } 
> ===
> how to replicate this bug:
> perform an nmap portscan against your zookeeper server: "nmap -sV -n 
> your.ip.here -p4181"
> wait for a while untill you see some messages in the logfile and then you 
> will see 100% cpu usage. It does not recover from this situation. With my 
> patch, it does not occur anymore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-855) clientPortBindAddress should be clientPortAddress

2010-10-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922512#action_12922512
 ] 

Hudson commented on ZOOKEEPER-855:
--

Integrated in ZooKeeper-trunk #971 (See 
[https://hudson.apache.org/hudson/job/ZooKeeper-trunk/971/])
ZOOKEEPER-855. clientPortBindAddress should be clientPortAddress
  (Jared Cantwell via fpj)


> clientPortBindAddress should be clientPortAddress
> -
>
> Key: ZOOKEEPER-855
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-855
> Project: Zookeeper
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.3.0, 3.3.1
>Reporter: Jared Cantwell
>Assignee: Jared Cantwell
>Priority: Trivial
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-855.patch, ZOOKEEPER-855.patch
>
>
> The server documentation states that the configuration parameter for binding 
> to a specific ip address is clientPortBindAddress.  The code believes the 
> parameter is clientPortAddress.  The documentation for 3.3.X versions needs 
> changed to reflect the correct parameter .  This parameter was added in 
> ZOOKEEPER-635.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.