[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-10-22 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923779#action_12923779
 ] 

Patrick Hunt commented on ZOOKEEPER-909:


I ran the tests on my own test harness and they passed. I've started to review, 
here's what I noticed so far:

Might want to fix these:
Warnings:
[javac] 
/home/phunt/dev/workspace/gitzk/src/java/main/org/apache/zookeeper/server/NettyServerCnxnFactory.java:37:
 warning: [deprecation] org.jboss.netty.channel.ChannelPipelineCoverage in 
org.jboss.netty.channel has been deprecated
[javac] import org.jboss.netty.channel.ChannelPipelineCoverage;
[javac]   ^
[javac] 
/home/phunt/dev/workspace/gitzk/src/java/main/org/apache/zookeeper/server/NettyServerCnxnFactory.java:64:
 warning: [deprecation] org.jboss.netty.channel.ChannelPipelineCoverage in 
org.jboss.netty.channel has been deprecated
[javac] @ChannelPipelineCoverage(all)
[javac]  ^

the added (new) files are missing a license header



 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-909.patch, ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-10-22 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-909:
--

Assignee: Patrick Hunt  (was: Thomas Koch)
  Status: Patch Available  (was: Open)

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Patrick Hunt
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-10-22 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-909:
--

Attachment: ZOOKEEPER-909.patch

add copyright blocks, replace deprecated ChannelPipelineCoverage annotation

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-10-22 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-909:
--

Status: Open  (was: Patch Available)

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages

2010-10-22 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923895#action_12923895
 ] 

Benjamin Reed commented on ZOOKEEPER-907:
-

sync doesn't cause any additional traffic over the atomic broadcast. it just 
makes sure that the all of the in-process transactions have be sent to the 
follower. when that error happens, the error will be sent back to the follower 
ordered after all of the completed transactions. so rather than being able to 
see the result of all requests initiated before the sync, the follower will see 
all requests completed before the sync. that is why i referred to it as a 
partial sync.

i'm really having problems trying to reproduce this error. can you describe 
more how it happened? i would like to have an end-to-end test rather than the 
test of a particular implementation so that this error doesn't pop up if the 
implementation changes. looking at the code it seems like it should happen 
everytime the sync request is sent to a follower, but that doesn't seem to be 
the case.

 Spurious KeeperErrorCode = Session moved messages
 ---

 Key: ZOOKEEPER-907
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.1
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-907.patch


 The sync request does not set the session owner in Request.
 As a result, the leader keeps printing:
 2010-07-01 10:55:36,733 - INFO  [ProcessThread:-1:preprequestproces...@405] - 
 Got user-level KeeperException when processing sessionid:0x298d3b1fa9 
 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error 
 Path:null Error:KeeperErrorCode = Session moved

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-10-22 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923905#action_12923905
 ] 

Benjamin Reed commented on ZOOKEEPER-909:
-

this is looking really nice. i'm not done reviewing, but i did want to note 
that you need to add the zookeeper.clientCxnSocket property to the doc. You 
should also javadoc that variable. 

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Patrick Hunt
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-903) Create a testing jar with useful classes from ZK test source

2010-10-22 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-903:


Fix Version/s: 3.4.0

 Create a testing jar with useful classes from ZK test source
 

 Key: ZOOKEEPER-903
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-903
 Project: Zookeeper
  Issue Type: Improvement
  Components: tests
Reporter: Camille Fournier
 Fix For: 3.4.0


 From mailing list:
 -Original Message-
 From: Benjamin Reed 
 Sent: Monday, October 18, 2010 11:12 AM
 To: zookeeper-u...@hadoop.apache.org
 Subject: Re: Testing zookeeper outside the source distribution?
   we should be exposing those classes and releasing them as a testing 
 jar. do you want to open up a jira to track this issue?
 ben
 On 10/18/2010 05:17 AM, Anthony Urso wrote:
  Anyone have any pointers on how to test against ZK outside of the
  source distribution? All the fun classes (e.g. ClientBase) do not make
  it into the ZK release jar.
 
  Right now I am manually running a ZK node for the unit tests to
  connect to prior to running my test, but I would rather have something
  that ant could reliably
  automate starting and stopping for CI.
 
  Thanks,
  Anthony

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-904) super digest is not actually acting as a full superuser

2010-10-22 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-904:


Fix Version/s: 3.4.0

 super digest is not actually acting as a full superuser
 ---

 Key: ZOOKEEPER-904
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-904
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
Reporter: Camille Fournier
Assignee: Camille Fournier
 Fix For: 3.4.0


 The documentation states:
 New in 3.2:  Enables a ZooKeeper ensemble administrator to access the znode 
 hierarchy as a super user. In particular no ACL checking occurs for a user 
 authenticated as super.
 However, if a super user does something like:
 zk.setACL(/, Ids.READ_ACL_UNSAFE, -1);
 the super user is now bound by read-only ACL. This is not what I would expect 
 to see given the documentation. It can be fixed by moving the chec for the 
 super authId in PrepRequestProcessor.checkACL to before the for(ACL a : 
 acl) loop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: implications of netty on client connections

2010-10-22 Thread Mahadev Konar
Hi Camille,
   I am a little curious here. Does this mean you tried a single zookeeper
server with 16K clients?

Thanks
mahadev

On 10/20/10 1:07 PM, Fournier, Camille F. [Tech] camille.fourn...@gs.com
wrote:

 Thanks Patrick, I'll look and see if I can figure out a clean change for this.
 It was the kernel limit for max number of open fds for the process that was
 where the problem shows up (not zk limit). FWIW, we tested with a process fd
 limit of 16K, and ZK performed reasonably well until the fd limit was reached,
 at which point it choked. There was a throughput degradation, but mostly going
 from 0 to 4000 connections. 4000 to 16000 was mostly flat until the sharp
 drop. For our use case it is fine to have a bit of performance loss with huge
 numbers of connections, so long as we can handle the choke, which for initial
 rollout I'm planning on just monitoring for.
 
 C
 
 -Original Message-
 From: Patrick Hunt [mailto:ph...@apache.org]
 Sent: Wednesday, October 20, 2010 2:06 PM
 To: zookeeper-dev@hadoop.apache.org
 Subject: Re: implications of netty on client connections
 
 It may just be the case that we haven't tested sufficiently for this case
 (running out of fds) and we need to handle this better even in nio. Probably
 by cutting off op_connect in the selector. We should be able to do similar
 in netty.
 
 Btw, on unix one can access the open/max fd count using this:
 http://download.oracle.com/javase/6/docs/jre/api/management/extension/com/sun/
 management/UnixOperatingSystemMXBean.html
 
 
 Secondly, are you running into a kernel limit or a zk limit? Take a look at
 this post describing 1million concurrent connections to a box:
 http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb
 -part-3
 
 specifically:
 --
 
 During various test with lots of connections, I ended up making some
 additional changes to my sysctl.conf. This was part trial-and-error, I don't
 really know enough about the internals to make especially informed decisions
 about which values to change. My policy was to wait for things to break,
 check /var/log/kern.log and see what mysterious error was reported, then
 increase stuff that sounded sensible after a spot of googling. Here are the
 settings in place during the above test:
 
 net.core.rmem_max = 33554432
 net.core.wmem_max = 33554432
 net.ipv4.tcp_rmem = 4096 16384 33554432
 net.ipv4.tcp_wmem = 4096 16384 33554432
 net.ipv4.tcp_mem = 786432 1048576 26777216
 net.ipv4.tcp_max_tw_buckets = 36
 net.core.netdev_max_backlog = 2500
 vm.min_free_kbytes = 65536
 vm.swappiness = 0
 net.ipv4.ip_local_port_range = 1024 65535
 
 --
 
 
 I'm guessing that even with this, at some point you'll run into a limit in
 our server implementation. In particular I suspect that we may start to
 respond more slowly to pings, eventually getting so bad it would time out.
 We'd have to debug that and address (optimize).
 
 http://www.metabrew.com/article/a-million-user-comet-application-with-mochiwe
 b-part-3
 Patrick
 
 On Tue, Oct 19, 2010 at 7:16 AM, Fournier, Camille F. [Tech] 
 camille.fourn...@gs.com wrote:
 
 Hi everyone,
 
 I'm curious what the implications of using netty are going to be for the
 case where a server gets close to its max available file descriptors. Right
 now our somewhat limited testing has shown that a ZK server performs fine up
 to the point when it runs out of available fds, at which point performance
 degrades sharply and new connections get into a somewhat bad state. Is netty
 going to enable the server to handle this situation more gracefully (or is
 there a way to do this already that I haven't found)? Limiting connections
 from the same client is not enough since we can potentially have far more
 clients wanting to connect than available fds for certain use cases we might
 consider.
 
 Thanks,
 Camille
 
 
 



Re: Heisenbugs, Bohrbugs, Mandelbugs?

2010-10-22 Thread Mahadev Konar
Hi Thomas,
  Could you verify this by just testing the trunk without your patch? You
might very well be right that those tests are a little flaky.

As for the hudson builds, Nigel is working on getting the patch builds for
zookeeper running. As soon as that gets fixed this flaky tests would show up
more often. 

Thanks
mahadev


On 10/20/10 11:48 PM, Thomas Koch tho...@koch.ro wrote:

 Hi,
 
 last night I let my hudson server do 42 (sic) builds of ZooKeeper trunk. One
 of this builds failed:
 
 junit.framework.AssertionFailedError: Leader hasn't joined: 5
 at org.apache.zookeeper.test.FLETest.testLE(FLETest.java:312)
 
 I did this many builds of trunk, because in my quest to redo the client netty
 integration step by step I made one step which resulted in 2 failed builds out
 of 8. The two failures were both:
 
 junit.framework.AssertionFailedError: Threads didn't join
 at
 
org.apache.zookeeper.test.FLERestartTest.testLERestart(FLERestartTest.java:198
)
 
 I can't find any relationship between the above test and my changes. The test
 does not use the ZooKeeper client code at all. So I begin to believe that
 there are some Heisenbugs, Bohrbugs or Mandelbugs[1] in ZooKeeper that just
 happen to show up from time to time without any relationship to the current
 changes.
 
 I'll try to investigate the cause further, maybe there is some relationship
 I've not yet found. But if my assumption should apply, then these kind of bugs
 would be a strong argument in favor of refactoring. These bugs are best found
 by cleaning the code, most important implementing strict separation of
 concerns.
 
 Wouldn't you like to setup Hudson to build ZooKeeper trunk every half an hour?
 
 [1] http://en.wikipedia.org/wiki/Unusual_software_bug
 
 Best regards,
 
 Thomas Koch, http://www.koch.ro
 



Re: Heisenbugs, Bohrbugs, Mandelbugs?

2010-10-22 Thread Thomas Koch
Mahadev Konar:
 Hi Thomas,
   Could you verify this by just testing the trunk without your patch? You
 might very well be right that those tests are a little flaky.
 
 As for the hudson builds, Nigel is working on getting the patch builds for
 zookeeper running. As soon as that gets fixed this flaky tests would show
 up more often.
 
 Thanks
 mahadev
 
 On 10/20/10 11:48 PM, Thomas Koch tho...@koch.ro wrote:
  Hi,
  
  last night I let my hudson server do 42 (sic) builds of ZooKeeper trunk.
  One of this builds failed:
  
  junit.framework.AssertionFailedError: Leader hasn't joined: 5
  
  at org.apache.zookeeper.test.FLETest.testLE(FLETest.java:312)
  
  I did this many builds of trunk, because in my quest to redo the client
  netty integration step by step I made one step which resulted in 2
  failed builds out of 8. The two failures were both:
Hi Mahadev,

as I've written, I did 42 builds of trunk over the night from which 2 failed 
and 8 builds of my patch during work time with 2 failures. I also did another 
round of builds of my patch during last night and got only 1 failure out of 
~40 succesful builds.

So I believe that the high failure rate of 2/8 from the initial round of patch 
builds is because I did this builds over the day while other developers also 
used other virtual machines on the same host.

Have a nice weekend,

Thomas Koch, http://www.koch.ro


[jira] Updated: (ZOOKEEPER-904) super digest is not actually acting as a full superuser

2010-10-22 Thread Camille Fournier (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-904:
---

Attachment: ZOOKEEPER-904.patch

Fix for trunk

 super digest is not actually acting as a full superuser
 ---

 Key: ZOOKEEPER-904
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-904
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
Reporter: Camille Fournier
Assignee: Camille Fournier
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-904.patch


 The documentation states:
 New in 3.2:  Enables a ZooKeeper ensemble administrator to access the znode 
 hierarchy as a super user. In particular no ACL checking occurs for a user 
 authenticated as super.
 However, if a super user does something like:
 zk.setACL(/, Ids.READ_ACL_UNSAFE, -1);
 the super user is now bound by read-only ACL. This is not what I would expect 
 to see given the documentation. It can be fixed by moving the chec for the 
 super authId in PrepRequestProcessor.checkACL to before the for(ACL a : 
 acl) loop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: implications of netty on client connections

2010-10-22 Thread Fournier, Camille F. [Tech]
Yes, that's correct.

C

-Original Message-
From: Mahadev Konar [mailto:maha...@yahoo-inc.com] 
Sent: Friday, October 22, 2010 1:39 PM
To: zookeeper-dev@hadoop.apache.org
Subject: Re: implications of netty on client connections

Hi Camille,
   I am a little curious here. Does this mean you tried a single zookeeper
server with 16K clients?

Thanks
mahadev

On 10/20/10 1:07 PM, Fournier, Camille F. [Tech] camille.fourn...@gs.com
wrote:

 Thanks Patrick, I'll look and see if I can figure out a clean change for this.
 It was the kernel limit for max number of open fds for the process that was
 where the problem shows up (not zk limit). FWIW, we tested with a process fd
 limit of 16K, and ZK performed reasonably well until the fd limit was reached,
 at which point it choked. There was a throughput degradation, but mostly going
 from 0 to 4000 connections. 4000 to 16000 was mostly flat until the sharp
 drop. For our use case it is fine to have a bit of performance loss with huge
 numbers of connections, so long as we can handle the choke, which for initial
 rollout I'm planning on just monitoring for.
 
 C
 
 -Original Message-
 From: Patrick Hunt [mailto:ph...@apache.org]
 Sent: Wednesday, October 20, 2010 2:06 PM
 To: zookeeper-dev@hadoop.apache.org
 Subject: Re: implications of netty on client connections
 
 It may just be the case that we haven't tested sufficiently for this case
 (running out of fds) and we need to handle this better even in nio. Probably
 by cutting off op_connect in the selector. We should be able to do similar
 in netty.
 
 Btw, on unix one can access the open/max fd count using this:
 http://download.oracle.com/javase/6/docs/jre/api/management/extension/com/sun/
 management/UnixOperatingSystemMXBean.html
 
 
 Secondly, are you running into a kernel limit or a zk limit? Take a look at
 this post describing 1million concurrent connections to a box:
 http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb
 -part-3
 
 specifically:
 --
 
 During various test with lots of connections, I ended up making some
 additional changes to my sysctl.conf. This was part trial-and-error, I don't
 really know enough about the internals to make especially informed decisions
 about which values to change. My policy was to wait for things to break,
 check /var/log/kern.log and see what mysterious error was reported, then
 increase stuff that sounded sensible after a spot of googling. Here are the
 settings in place during the above test:
 
 net.core.rmem_max = 33554432
 net.core.wmem_max = 33554432
 net.ipv4.tcp_rmem = 4096 16384 33554432
 net.ipv4.tcp_wmem = 4096 16384 33554432
 net.ipv4.tcp_mem = 786432 1048576 26777216
 net.ipv4.tcp_max_tw_buckets = 36
 net.core.netdev_max_backlog = 2500
 vm.min_free_kbytes = 65536
 vm.swappiness = 0
 net.ipv4.ip_local_port_range = 1024 65535
 
 --
 
 
 I'm guessing that even with this, at some point you'll run into a limit in
 our server implementation. In particular I suspect that we may start to
 respond more slowly to pings, eventually getting so bad it would time out.
 We'd have to debug that and address (optimize).
 
 http://www.metabrew.com/article/a-million-user-comet-application-with-mochiwe
 b-part-3
 Patrick
 
 On Tue, Oct 19, 2010 at 7:16 AM, Fournier, Camille F. [Tech] 
 camille.fourn...@gs.com wrote:
 
 Hi everyone,
 
 I'm curious what the implications of using netty are going to be for the
 case where a server gets close to its max available file descriptors. Right
 now our somewhat limited testing has shown that a ZK server performs fine up
 to the point when it runs out of available fds, at which point performance
 degrades sharply and new connections get into a somewhat bad state. Is netty
 going to enable the server to handle this situation more gracefully (or is
 there a way to do this already that I haven't found)? Limiting connections
 from the same client is not enough since we can potentially have far more
 clients wanting to connect than available fds for certain use cases we might
 consider.
 
 Thanks,
 Camille
 
 
 



[jira] Updated: (ZOOKEEPER-904) super digest is not actually acting as a full superuser

2010-10-22 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-904:
---

Status: Patch Available  (was: Open)

Thanks for the patch, feel free to click submit patch once you have a patch 
ready to go. It transitions the workflow and lets us (committers) know to 
review your patch.

 super digest is not actually acting as a full superuser
 ---

 Key: ZOOKEEPER-904
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-904
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
Reporter: Camille Fournier
Assignee: Camille Fournier
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-904.patch


 The documentation states:
 New in 3.2:  Enables a ZooKeeper ensemble administrator to access the znode 
 hierarchy as a super user. In particular no ACL checking occurs for a user 
 authenticated as super.
 However, if a super user does something like:
 zk.setACL(/, Ids.READ_ACL_UNSAFE, -1);
 the super user is now bound by read-only ACL. This is not what I would expect 
 to see given the documentation. It can be fixed by moving the chec for the 
 super authId in PrepRequestProcessor.checkACL to before the for(ACL a : 
 acl) loop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-10-22 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923975#action_12923975
 ] 

Patrick Hunt commented on ZOOKEEPER-909:


Thomas, I assigned the jira to you because you're doing most/all the work to 
get this done, not as a work token. I believe you should get the credit when 
this patch gets committed.

Typically we use assignment (esp when a patch gets committed) to credit the 
author - that's one of the criteria we monitor when deciding on new committers 
(number and quality of patches, testing, conformance to community style 
guidelinesl, etc...)

Feel free to reassign this to yourself (please).

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Patrick Hunt
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-10-22 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch reassigned ZOOKEEPER-909:
-

Assignee: Thomas Koch  (was: Patrick Hunt)

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-10-22 Thread Thomas Koch (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923986#action_12923986
 ] 

Thomas Koch commented on ZOOKEEPER-909:
---

Hi Benjamin,

you mean, I should add a class javadoc for ClientCnxnSocket and a javadoc for 
the socket property in ClientCnxn.SendThread? You're right. However I did not 
yet come to an end with thinking about an elegant structure for the Classes 
ClientCnxn, SendThread and ClientCnxnSocket. I believe that the 
ClientCnxnSocket class won't remain for long as it is in this patch.
For example SendThread and ClientCnxn have a circular dependency which I really 
don't like. Also both classes work on the common properties incomingBuffer and 
outgoingBuffer which is suboptimal.
So I'd like to ask for forgiveness for sparse (or inexistent) documentation 
until we settle on a final design.

I also want to start to learn the server code now to see, whether it makes 
sense to generalize certain things.

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-805) four letter words fail with latest ubuntu nc.openbsd

2010-10-22 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924021#action_12924021
 ] 

Mahadev konar commented on ZOOKEEPER-805:
-

Pat,
   You think this should go into 3.3.2?


 four letter words fail with latest ubuntu nc.openbsd
 

 Key: ZOOKEEPER-805
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-805
 Project: Zookeeper
  Issue Type: Bug
  Components: documentation, server
Affects Versions: 3.3.1, 3.4.0
Reporter: Patrick Hunt
Priority: Critical
 Fix For: 3.3.2, 3.4.0


 In both 3.3 branch and trunk echo stat|nc localhost 2181 fails against the 
 ZK server on Ubuntu Lucid Lynx.
 I noticed this after upgrading to lucid lynx - which is now shipping openbsd 
 nc as the default:
 OpenBSD netcat (Debian patchlevel 1.89-3ubuntu2)
 vs nc traditional
 [v1.10-38]
 which works fine. Not sure if this is a bug in us or nc.openbsd, but it's 
 currently not working for me. Ugh.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-805) four letter words fail with latest ubuntu nc.openbsd

2010-10-22 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924037#action_12924037
 ] 

Patrick Hunt commented on ZOOKEEPER-805:


Hi Mahadev, I don't think that's necessary given you can fallback to 
traditional nc, or you can use the -q option as suggested by akovi.

On my ubuntu system (lucid/maverick) I have two executables; nc.openbsd and 
nc.traditional. nc links to openbsd version by default.

Honestly I'm not sure why this is no longer working, given that we addressed 
the nc closes input first in ZOOKEEPER-737

 four letter words fail with latest ubuntu nc.openbsd
 

 Key: ZOOKEEPER-805
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-805
 Project: Zookeeper
  Issue Type: Bug
  Components: documentation, server
Affects Versions: 3.3.1, 3.4.0
Reporter: Patrick Hunt
Priority: Critical
 Fix For: 3.3.2, 3.4.0


 In both 3.3 branch and trunk echo stat|nc localhost 2181 fails against the 
 ZK server on Ubuntu Lucid Lynx.
 I noticed this after upgrading to lucid lynx - which is now shipping openbsd 
 nc as the default:
 OpenBSD netcat (Debian patchlevel 1.89-3ubuntu2)
 vs nc traditional
 [v1.10-38]
 which works fine. Not sure if this is a bug in us or nc.openbsd, but it's 
 currently not working for me. Ugh.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[VOTE] ZooKeeper as TLP?

2010-10-22 Thread Patrick Hunt
Please vote as to whether you think ZooKeeper should become a
top-level Apache project, as discussed previously on this list. I've
included below a draft board resolution.

Do folks support sending this request on to the Hadoop PMC?

Patrick



X. Establish the Apache ZooKeeper Project

   WHEREAS, the Board of Directors deems it to be in the best
   interests of the Foundation and consistent with the
   Foundation's purpose to establish a Project Management
   Committee charged with the creation and maintenance of
   open-source software related to distributed system coordination
   for distribution at no charge to the public.

   NOW, THEREFORE, BE IT RESOLVED, that a Project Management
   Committee (PMC), to be known as the Apache ZooKeeper Project,
   be and hereby is established pursuant to Bylaws of the
   Foundation; and be it further

   RESOLVED, that the Apache ZooKeeper Project be and hereby is
   responsible for the creation and maintenance of software
   related to distributed system coordination; and be it further

   RESOLVED, that the office of Vice President, Apache ZooKeeper be
   and hereby is created, the person holding such office to
   serve at the direction of the Board of Directors as the chair
   of the Apache ZooKeeper Project, and to have primary responsibility
   for management of the projects within the scope of
   responsibility of the Apache ZooKeeper Project; and be it further

   RESOLVED, that the persons listed immediately below be and
   hereby are appointed to serve as the initial members of the
   Apache ZooKeeper Project:

 * Patrick Hunt ph...@apache.org
 * Flavio Junqueira f...@apache.org
 * Mahadev Konarmaha...@apache.org
 * Benjamin Reedbr...@apache.org
 * Henry Robinson   he...@apache.org

   NOW, THEREFORE, BE IT FURTHER RESOLVED, that Patrick Hunt
   be appointed to the office of Vice President, Apache ZooKeeper, to
   serve in accordance with and subject to the direction of the
   Board of Directors and the Bylaws of the Foundation until
   death, resignation, retirement, removal or disqualification,
   or until a successor is appointed; and be it further

   RESOLVED, that the initial Apache ZooKeeper PMC be and hereby is
   tasked with the creation of a set of bylaws intended to
   encourage open development and increased participation in the
   Apache ZooKeeper Project; and be it further

   RESOLVED, that the Apache ZooKeeper Project be and hereby
   is tasked with the migration and rationalization of the Apache
   Hadoop ZooKeeper sub-project; and be it further

   RESOLVED, that all responsibilities pertaining to the Apache
   Hadoop ZooKeeper sub-project encumbered upon the
   Apache Hadoop Project are hereafter discharged.


Re: [VOTE] ZooKeeper as TLP?

2010-10-22 Thread Henry Robinson
+1

On 22 October 2010 14:53, Mahadev Konar maha...@yahoo-inc.com wrote:

 +1

 On 10/22/10 2:42 PM, Patrick Hunt ph...@apache.org wrote:

  Please vote as to whether you think ZooKeeper should become a
  top-level Apache project, as discussed previously on this list. I've
  included below a draft board resolution.
 
  Do folks support sending this request on to the Hadoop PMC?
 
  Patrick
 
  
 
  X. Establish the Apache ZooKeeper Project
 
 WHEREAS, the Board of Directors deems it to be in the best
 interests of the Foundation and consistent with the
 Foundation's purpose to establish a Project Management
 Committee charged with the creation and maintenance of
 open-source software related to distributed system coordination
 for distribution at no charge to the public.
 
 NOW, THEREFORE, BE IT RESOLVED, that a Project Management
 Committee (PMC), to be known as the Apache ZooKeeper Project,
 be and hereby is established pursuant to Bylaws of the
 Foundation; and be it further
 
 RESOLVED, that the Apache ZooKeeper Project be and hereby is
 responsible for the creation and maintenance of software
 related to distributed system coordination; and be it further
 
 RESOLVED, that the office of Vice President, Apache ZooKeeper be
 and hereby is created, the person holding such office to
 serve at the direction of the Board of Directors as the chair
 of the Apache ZooKeeper Project, and to have primary
 responsibility
 for management of the projects within the scope of
 responsibility of the Apache ZooKeeper Project; and be it further
 
 RESOLVED, that the persons listed immediately below be and
 hereby are appointed to serve as the initial members of the
 Apache ZooKeeper Project:
 
   * Patrick Hunt ph...@apache.org
   * Flavio Junqueira f...@apache.org
   * Mahadev Konarmaha...@apache.org
   * Benjamin Reedbr...@apache.org
   * Henry Robinson   he...@apache.org
 
 NOW, THEREFORE, BE IT FURTHER RESOLVED, that Patrick Hunt
 be appointed to the office of Vice President, Apache ZooKeeper, to
 serve in accordance with and subject to the direction of the
 Board of Directors and the Bylaws of the Foundation until
 death, resignation, retirement, removal or disqualification,
 or until a successor is appointed; and be it further
 
 RESOLVED, that the initial Apache ZooKeeper PMC be and hereby is
 tasked with the creation of a set of bylaws intended to
 encourage open development and increased participation in the
 Apache ZooKeeper Project; and be it further
 
 RESOLVED, that the Apache ZooKeeper Project be and hereby
 is tasked with the migration and rationalization of the Apache
 Hadoop ZooKeeper sub-project; and be it further
 
 RESOLVED, that all responsibilities pertaining to the Apache
 Hadoop ZooKeeper sub-project encumbered upon the
 Apache Hadoop Project are hereafter discharged.
 




-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679