[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Thomas Koch (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930544#action_12930544
 ] 

Thomas Koch commented on ZOOKEEPER-909:
---

Dear Hudson,

you're terribly annoying me! Looking at your web interface I can't find any 
failed unit test for this build. And the javadoc warning is due to the 
acquisition of SUN by Oracle (and the following unavailability of the old sun 
javadoc websites). - Don't blame me for that!

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930546#action_12930546
 ] 

Flavio Junqueira commented on ZOOKEEPER-909:


Thomas, Check the console output on hudson, close to the end of the page. The 
failure seems to be on the C tests.

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation

2010-11-10 Thread Thomas Koch (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930547#action_12930547
 ] 

Thomas Koch commented on ZOOKEEPER-925:
---

Please don't check in _any_ generated files in version control. When I package 
zookeeper for Debian (same applies to any other free software distro) I may not 
include generated files.
Debian needs to make sure that it has all source files for everything that's 
delivered and that it attributes the copyright correctly. Not having any 
generated files but to generate everything at Debian package build time makes 
this a lot easier.

Just a note: The same applies for the released tarballs, but that's another 
topic.

 Consider maven site generation to replace our forrest site and documentation 
 generation
 ---

 Key: ZOOKEEPER-925
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925
 Project: Zookeeper
  Issue Type: Wish
  Components: documentation
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Attachments: ZOOKEEPER-925.patch


 See WHIRR-19 for some background.
 In whirr we looked at a number of site/doc generation facilities. In the end 
 Maven site generation plugin turned out to be by far the best option. You can 
 see our nascent site here (no attempt at styling,etc so far):
 http://incubator.apache.org/whirr/
 In particular take a look at the quick start:
 http://incubator.apache.org/whirr/quick-start-guide.html
 which was generated from
 http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence
 notice this was standard wiki markup (confluence wiki markup, same as 
 available from apache)
 You can read more about mvn site plugin here:
 http://maven.apache.org/guides/mini/guide-site.html
 Notice that other formats are available, not just confluence markup, also 
 note that you can use different markup formats if you like in the same site 
 (although probably not a great idea, but in some cases might be handy, for 
 example whirr uses the confluence wiki, so we can pretty much copy/paste 
 source docs from wiki to our site (svn) if we like)
 Re maven vs our current ant based build. It's probably a good idea for us to 
 move the build to maven at some point. We could initially move just the doc 
 generation, and then incrementally move functionality from build.xml to mvn 
 over a longer time period.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930548#action_12930548
 ] 

Patrick Hunt commented on ZOOKEEPER-909:


Hi Thomas. Still shaky legs on getting the patch queue up and working again. 
Shouldn't keep us from getting this committed though.

re javadoc, this is not an issue for the other patches afaict, any idea why 
it's just showing up for this patch?

There are two sets of tests, java and the c client binding. Unfortunately 
hudson currently does not highlight c failures on the summary page, you need to 
checkout the console (usually raw) in the case where the tests fail (but not 
java test).

Looking at console I see:

 [exec]  [exec]  ZooKeeper server process failed ZooKeeper server NOT 
startedRunning 

I've notified Nigel about this to see is he has insight (saw it on a couple 
other jiras). So far he hasn't had a chance to look into it.

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930552#action_12930552
 ] 

Hadoop QA commented on ZOOKEEPER-909:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12459171/ZOOKEEPER-909.patch
  against trunk revision 1033155.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/23//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/23//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/23//console

This message is automatically generated.

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930553#action_12930553
 ] 

Patrick Hunt commented on ZOOKEEPER-909:


better, but why is javadoc failing for this but not the other patches?

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-926) Fork Hadoop common's test-patch.sh and modify for Zookeeper

2010-11-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930571#action_12930571
 ] 

Hudson commented on ZOOKEEPER-926:
--

Integrated in ZooKeeper-trunk #996 (See 
[https://hudson.apache.org/hudson/job/ZooKeeper-trunk/996/])
ZOOKEEPER-926. Fork Hadoop common's test-patch.sh and modify for Zookeeper. 
Update allowed number of warnings.
ZOOKEEPER-926. Fork Hadoop common's test-patch.sh and modify for Zookeeper. 
Remove unneeded params from test-patch.sh.
ZOOKEEPER-926. Fork Hadoop common's test-patch.sh and modify for Zookeeper. 
Updated test-patch.sh and build.xml. Contributed by nigel.


 Fork Hadoop common's test-patch.sh and modify for Zookeeper
 ---

 Key: ZOOKEEPER-926
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-926
 Project: Zookeeper
  Issue Type: Improvement
  Components: build
Reporter: Nigel Daley
Assignee: Nigel Daley
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-926.patch


 Zookeeper currently uses the test-patch.sh script from the Hadoop nightly 
 dir.  This is now out of date.  I propose we just copy the updated one in 
 Hadoop common and then modify for ZK.  This will also help as ZK moves out of 
 Hadoop to it's own TLP.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-909:
--

Status: Patch Available  (was: Open)

trying to trigger the hudson build once again to see whether the javadoc issue 
is still there.

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Thomas Koch (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930558#action_12930558
 ] 

Thomas Koch commented on ZOOKEEPER-909:
---

In our build.xml we still have the old Sun link for javadoc ( 
http://java.sun.com/javase/6/docs/api/ ). This link redirects to 
http://download.oracle.com/javase/6/docs/api/
However when trying the old Sun link in my browser I first got a timeout while 
the second attempt worked.
So one possibility would be that I just had bad luck? Or is there any cache 
between the javadoc task and the Sun website? I've introduced a new class ( 
java.util.concurrent.CopyOnWriteArraySet ). Maybe this class has not yet been 
used elsewhere and therefor I need to hit the Sun website while other builds 
are served from some cache?

Anyhow. It would be best, if the build would not need to access the internet at 
all. I don't now javadoc good enough, but the ant task documentation gives me 
the impression that it should be possible to build the docs also offline: 
http://ant.apache.org/manual/Tasks/javadoc.html
( see documentation for the link element )

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-850) Switch from log4j to slf4j

2010-11-10 Thread Olaf Krische (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olaf Krische updated ZOOKEEPER-850:
---

Release Note: 
* replaces log4j with slf4j code (also in contrib for bookkeeper, 
zooinspector,rest,loggraph), added slf4j dependencies into several ivy.xml files
* you must add slf4j-api-1.6.1.jar and slf4j-log4j12-1.6.1.jar (bridge from 
sl4j to log4j) to the classpath, if not using the standard scripts
* log4j remains as the final logger yet. Therefore there is still  work to do: 
remove programmatic access to the log4j from certain classes (which add 
appenders or configure log4j at runtime), or move them to contrib



  was:
introduces indirection for logging via slf4j-api. adding bridge from slf4j to 
log4j implementation.

1) added slf4j dependency in ivy.xml

2) replaced:
- import org.apache.log4j.Logger with org.slf4j.Logger,LoggerFactory
- org.apache.log4j.Logger with org.slf4j.Logger
- org.apache.log4j.Logger.getLogger with org.slf4j.LoggerFactory.getLogger

3) replaced log.fatal with log.error, slf4j api has no log.fatal, faq 
recommends log.error

4) fixed logging requests, like log.error(object) with 
log.error(String.valueOf(object)) to match slf4j api

5) removed direct log4j-api access from 
org.apache.bookkeeper.util.LocalBookKeeper in contrib.
it added programmatically a console appender to the existing logger in the 
constructor. this can be done anytime via log4j.properties (which had by 
default INFO,CONSOLE anyways)


 Switch from log4j to slf4j
 --

 Key: ZOOKEEPER-850
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-850
 Project: Zookeeper
  Issue Type: Improvement
  Components: java client
Affects Versions: 3.3.1
Reporter: Olaf Krische
Assignee: Olaf Krische
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-3.3.1-log4j-slf4j-20101031.patch.bz2, 
 ZOOKEEPER-3.4.0-log4j-slf4j-20101102.patch.bz2, ZOOKEEPER-850.patch


 Hello,
 i would like to see slf4j integrated into the zookeeper instead of relying 
 explicitly on log4j.
 slf4j is an abstract logging framework. There are adapters from slf4j to many 
 logger implementations, one of them is log4j.
 The decision which log engine to use i dont like to make so early.
 This would help me to embed zookeeper in my own applications (which use a 
 different logger implemenation, but slf4j is the basis)
 What do you think?
 (as i can see, those slf4j request flood all other projects on apache as well 
 :-)
 Maybe for 3.4 or 4.0?
 I can offer a patchset, i have experience in such an migration already. :-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-909:
--

Status: Open  (was: Patch Available)

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-896) Improve C client to support dynamic authentication schemes

2010-11-10 Thread Botond Hejj (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botond Hejj updated ZOOKEEPER-896:
--

Attachment: (was: ZOOKEEPER-896.patch)

 Improve C client to support dynamic authentication schemes
 --

 Key: ZOOKEEPER-896
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-896
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
Reporter: Botond Hejj
Assignee: Botond Hejj
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-896.patch


 When we started exploring zookeeper for our requirements we found the 
 authentication mechanism is not flexible enough.
 We want to use kerberos for authentication but using the current API we ran 
 into a few problems. The idea is that we get a kerberos token on the client 
 side and than send that token to the server with a kerberos scheme. A server 
 side authentication plugin can use that token to authenticate the client and 
 also use the token for authorization.
 We ran into two problems with this approach:
 1. A different kerberos token is needed for each different server that client 
 can connect to since kerberos uses mutual authentication. That means when the 
 client acquires this kerberos token it has to know which server it connects 
 to and generate the token according to that. The client currently can't 
 generate a token for a specific server. The token stored in the auth_info is 
 used for all the servers.
 2. The kerberos token might have an expiry time so if the client loses the 
 connection to the server and than it tries to reconnect it should acquire a 
 new token. That is not possible currently since the token is stored in 
 auth_info and reused for every connection.
 The problem can be solved if we allow the client to register a callback for 
 authentication instead a static token. This can be a callback with an 
 argument which passes the current host string. The zookeeper client code 
 could call this callback before it sends the authentication info to the 
 server to get a fresh server specific token.
 This would solve our problem with the kerberos authentication and also could 
 be used for other more dynamic authentication schemes.
 The solution could be generalization also for the java client as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-850) Switch from log4j to slf4j

2010-11-10 Thread Olaf Krische (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olaf Krische updated ZOOKEEPER-850:
---

Release Note: 
* replaces log4j with slf4j code (also in contrib for bookkeeper, 
zooinspector,rest,loggraph), added slf4j dependencies into several ivy.xml files
* you must add slf4j-api-1.6.1.jar and slf4j-log4j12-1.6.1.jar (bridge from 
sl4j to log4j) to the classpath, if not using the standard scripts
* log4j remains as the final logger yet, there is still work to do: remove 
programmatic access to the log4j from certain classes (which add appenders or 
configure log4j at runtime), or move them to contrib



  was:
* replaces log4j with slf4j code (also in contrib for bookkeeper, 
zooinspector,rest,loggraph), added slf4j dependencies into several ivy.xml files
* you must add slf4j-api-1.6.1.jar and slf4j-log4j12-1.6.1.jar (bridge from 
sl4j to log4j) to the classpath, if not using the standard scripts
* log4j remains as the final logger yet. Therefore there is still  work to do: 
remove programmatic access to the log4j from certain classes (which add 
appenders or configure log4j at runtime), or move them to contrib




 Switch from log4j to slf4j
 --

 Key: ZOOKEEPER-850
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-850
 Project: Zookeeper
  Issue Type: Improvement
  Components: java client
Affects Versions: 3.3.1
Reporter: Olaf Krische
Assignee: Olaf Krische
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-3.3.1-log4j-slf4j-20101031.patch.bz2, 
 ZOOKEEPER-3.4.0-log4j-slf4j-20101102.patch.bz2, ZOOKEEPER-850.patch


 Hello,
 i would like to see slf4j integrated into the zookeeper instead of relying 
 explicitly on log4j.
 slf4j is an abstract logging framework. There are adapters from slf4j to many 
 logger implementations, one of them is log4j.
 The decision which log engine to use i dont like to make so early.
 This would help me to embed zookeeper in my own applications (which use a 
 different logger implemenation, but slf4j is the basis)
 What do you think?
 (as i can see, those slf4j request flood all other projects on apache as well 
 :-)
 Maybe for 3.4 or 4.0?
 I can offer a patchset, i have experience in such an migration already. :-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-850) Switch from log4j to slf4j

2010-11-10 Thread Olaf Krische (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olaf Krische updated ZOOKEEPER-850:
---

Release Note: 
* replaces log4j with slf4j code (also in contrib for bookkeeper, 
zooinspector,rest,loggraph), added slf4j dependencies into several ivy.xml files
* you must add slf4j-api-1.6.1.jar and slf4j-log4j12-1.6.1.jar (bridge from 
sl4j to log4j) to the classpath, if not using the standard scripts
* log4j remains as the final logger yet, there is still work to do: remove 
programmatic access to the log4j api from certain classes (which add appenders 
or configure log4j at runtime), or move them to contrib



  was:
* replaces log4j with slf4j code (also in contrib for bookkeeper, 
zooinspector,rest,loggraph), added slf4j dependencies into several ivy.xml files
* you must add slf4j-api-1.6.1.jar and slf4j-log4j12-1.6.1.jar (bridge from 
sl4j to log4j) to the classpath, if not using the standard scripts
* log4j remains as the final logger yet, there is still work to do: remove 
programmatic access to the log4j from certain classes (which add appenders or 
configure log4j at runtime), or move them to contrib




 Switch from log4j to slf4j
 --

 Key: ZOOKEEPER-850
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-850
 Project: Zookeeper
  Issue Type: Improvement
  Components: java client
Affects Versions: 3.3.1
Reporter: Olaf Krische
Assignee: Olaf Krische
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-3.3.1-log4j-slf4j-20101031.patch.bz2, 
 ZOOKEEPER-3.4.0-log4j-slf4j-20101102.patch.bz2, ZOOKEEPER-850.patch


 Hello,
 i would like to see slf4j integrated into the zookeeper instead of relying 
 explicitly on log4j.
 slf4j is an abstract logging framework. There are adapters from slf4j to many 
 logger implementations, one of them is log4j.
 The decision which log engine to use i dont like to make so early.
 This would help me to embed zookeeper in my own applications (which use a 
 different logger implemenation, but slf4j is the basis)
 What do you think?
 (as i can see, those slf4j request flood all other projects on apache as well 
 :-)
 Maybe for 3.4 or 4.0?
 I can offer a patchset, i have experience in such an migration already. :-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-896) Improve C client to support dynamic authentication schemes

2010-11-10 Thread Botond Hejj (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botond Hejj updated ZOOKEEPER-896:
--

Attachment: ZOOKEEPER-896.patch

added unit test
fixed initialization bug

 Improve C client to support dynamic authentication schemes
 --

 Key: ZOOKEEPER-896
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-896
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
Reporter: Botond Hejj
Assignee: Botond Hejj
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-896.patch, ZOOKEEPER-896.patch


 When we started exploring zookeeper for our requirements we found the 
 authentication mechanism is not flexible enough.
 We want to use kerberos for authentication but using the current API we ran 
 into a few problems. The idea is that we get a kerberos token on the client 
 side and than send that token to the server with a kerberos scheme. A server 
 side authentication plugin can use that token to authenticate the client and 
 also use the token for authorization.
 We ran into two problems with this approach:
 1. A different kerberos token is needed for each different server that client 
 can connect to since kerberos uses mutual authentication. That means when the 
 client acquires this kerberos token it has to know which server it connects 
 to and generate the token according to that. The client currently can't 
 generate a token for a specific server. The token stored in the auth_info is 
 used for all the servers.
 2. The kerberos token might have an expiry time so if the client loses the 
 connection to the server and than it tries to reconnect it should acquire a 
 new token. That is not possible currently since the token is stored in 
 auth_info and reused for every connection.
 The problem can be solved if we allow the client to register a callback for 
 authentication instead a static token. This can be a callback with an 
 argument which passes the current host string. The zookeeper client code 
 could call this callback before it sends the authentication info to the 
 server to get a fresh server specific token.
 This would solve our problem with the kerberos authentication and also could 
 be used for other more dynamic authentication schemes.
 The solution could be generalization also for the java client as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930574#action_12930574
 ] 

Hadoop QA commented on ZOOKEEPER-909:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12459237/ZOOKEEPER-909.patch
  against trunk revision 1033155.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/24//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/24//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/24//console

This message is automatically generated.

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-909:
--

Status: Open  (was: Patch Available)

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-909:
--

Attachment: (was: ZOOKEEPER-909.patch)

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-909:
--

Attachment: ZOOKEEPER-909.patch

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-909:
--

Status: Patch Available  (was: Open)

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-896) Improve C client to support dynamic authentication schemes

2010-11-10 Thread Botond Hejj (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botond Hejj updated ZOOKEEPER-896:
--

Attachment: ZOOKEEPER-896.patch

 Improve C client to support dynamic authentication schemes
 --

 Key: ZOOKEEPER-896
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-896
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
Reporter: Botond Hejj
Assignee: Botond Hejj
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-896.patch


 When we started exploring zookeeper for our requirements we found the 
 authentication mechanism is not flexible enough.
 We want to use kerberos for authentication but using the current API we ran 
 into a few problems. The idea is that we get a kerberos token on the client 
 side and than send that token to the server with a kerberos scheme. A server 
 side authentication plugin can use that token to authenticate the client and 
 also use the token for authorization.
 We ran into two problems with this approach:
 1. A different kerberos token is needed for each different server that client 
 can connect to since kerberos uses mutual authentication. That means when the 
 client acquires this kerberos token it has to know which server it connects 
 to and generate the token according to that. The client currently can't 
 generate a token for a specific server. The token stored in the auth_info is 
 used for all the servers.
 2. The kerberos token might have an expiry time so if the client loses the 
 connection to the server and than it tries to reconnect it should acquire a 
 new token. That is not possible currently since the token is stored in 
 auth_info and reused for every connection.
 The problem can be solved if we allow the client to register a callback for 
 authentication instead a static token. This can be a callback with an 
 argument which passes the current host string. The zookeeper client code 
 could call this callback before it sends the authentication info to the 
 server to get a fresh server specific token.
 This would solve our problem with the kerberos authentication and also could 
 be used for other more dynamic authentication schemes.
 The solution could be generalization also for the java client as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-896) Improve C client to support dynamic authentication schemes

2010-11-10 Thread Botond Hejj (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botond Hejj updated ZOOKEEPER-896:
--

Attachment: (was: ZOOKEEPER-896.patch)

 Improve C client to support dynamic authentication schemes
 --

 Key: ZOOKEEPER-896
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-896
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
Reporter: Botond Hejj
Assignee: Botond Hejj
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-896.patch


 When we started exploring zookeeper for our requirements we found the 
 authentication mechanism is not flexible enough.
 We want to use kerberos for authentication but using the current API we ran 
 into a few problems. The idea is that we get a kerberos token on the client 
 side and than send that token to the server with a kerberos scheme. A server 
 side authentication plugin can use that token to authenticate the client and 
 also use the token for authorization.
 We ran into two problems with this approach:
 1. A different kerberos token is needed for each different server that client 
 can connect to since kerberos uses mutual authentication. That means when the 
 client acquires this kerberos token it has to know which server it connects 
 to and generate the token according to that. The client currently can't 
 generate a token for a specific server. The token stored in the auth_info is 
 used for all the servers.
 2. The kerberos token might have an expiry time so if the client loses the 
 connection to the server and than it tries to reconnect it should acquire a 
 new token. That is not possible currently since the token is stored in 
 auth_info and reused for every connection.
 The problem can be solved if we allow the client to register a callback for 
 authentication instead a static token. This can be a callback with an 
 argument which passes the current host string. The zookeeper client code 
 could call this callback before it sends the authentication info to the 
 server to get a fresh server specific token.
 This would solve our problem with the kerberos authentication and also could 
 be used for other more dynamic authentication schemes.
 The solution could be generalization also for the java client as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-896) Improve C client to support dynamic authentication schemes

2010-11-10 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930648#action_12930648
 ] 

Patrick Hunt commented on ZOOKEEPER-896:


Hi Bontond, if this is ready to go (you think it's ready for review/commit) 
please click the submit patch link on the left hand side of this page. That 
will trigger the necessary workflow. thanks!

 Improve C client to support dynamic authentication schemes
 --

 Key: ZOOKEEPER-896
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-896
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
Reporter: Botond Hejj
Assignee: Botond Hejj
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-896.patch


 When we started exploring zookeeper for our requirements we found the 
 authentication mechanism is not flexible enough.
 We want to use kerberos for authentication but using the current API we ran 
 into a few problems. The idea is that we get a kerberos token on the client 
 side and than send that token to the server with a kerberos scheme. A server 
 side authentication plugin can use that token to authenticate the client and 
 also use the token for authorization.
 We ran into two problems with this approach:
 1. A different kerberos token is needed for each different server that client 
 can connect to since kerberos uses mutual authentication. That means when the 
 client acquires this kerberos token it has to know which server it connects 
 to and generate the token according to that. The client currently can't 
 generate a token for a specific server. The token stored in the auth_info is 
 used for all the servers.
 2. The kerberos token might have an expiry time so if the client loses the 
 connection to the server and than it tries to reconnect it should acquire a 
 new token. That is not possible currently since the token is stored in 
 auth_info and reused for every connection.
 The problem can be solved if we allow the client to register a callback for 
 authentication instead a static token. This can be a callback with an 
 argument which passes the current host string. The zookeeper client code 
 could call this callback before it sends the authentication info to the 
 server to get a fresh server specific token.
 This would solve our problem with the kerberos authentication and also could 
 be used for other more dynamic authentication schemes.
 The solution could be generalization also for the java client as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing

2010-11-10 Thread Nicholas Harteau (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Harteau updated ZOOKEEPER-905:
---

Attachment: zkServer.sh.diff

svn diff vs. release-3.3.1 (thanks patrick)

 enhance zkServer.sh for easier zookeeper automation-izing
 -

 Key: ZOOKEEPER-905
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905
 Project: Zookeeper
  Issue Type: Improvement
  Components: scripts
Reporter: Nicholas Harteau
Assignee: Nicholas Harteau
Priority: Minor
 Fix For: 3.4.0

 Attachments: zkServer.sh.diff


 zkServer.sh is good at starting zookeeper and figuring out the right options 
 to pass along.
 unfortunately if you want to wrap zookeeper startup/shutdown in any 
 significant way, you have to reimplement a bunch of the logic there.
 the attached patch addresses a couple simple issues:
 1. add a 'start-foreground' option to zkServer.sh - this allows things that 
 expect to manage a foregrounded process (daemontools, launchd, etc) to use 
 zkServer.sh instead of rolling their own to launch zookeeper
 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper 
 from the script, just give me the command you'd normally use to exec 
 zookeeper.  I found this useful when writing automation to start/stop 
 zookeeper as part of smoke testing zookeeper-based applications
 3. Deal more gracefully with supplying alternate configuration files to 
 zookeeper - currently the script assumes all config files reside in 
 $ZOOCFGDIR - also useful for smoke testing
 4. communicate extra info (JMX enabled) about zookeeper on STDERR rather 
 than STDOUT (necessary for #2)
 5. fixes an issue on macos where readlink doesn't have the '-f' option.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing

2010-11-10 Thread Nicholas Harteau (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Harteau updated ZOOKEEPER-905:
---

Attachment: (was: zkServer.sh.diff)

 enhance zkServer.sh for easier zookeeper automation-izing
 -

 Key: ZOOKEEPER-905
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905
 Project: Zookeeper
  Issue Type: Improvement
  Components: scripts
Reporter: Nicholas Harteau
Assignee: Nicholas Harteau
Priority: Minor
 Fix For: 3.4.0

 Attachments: zkServer.sh.diff


 zkServer.sh is good at starting zookeeper and figuring out the right options 
 to pass along.
 unfortunately if you want to wrap zookeeper startup/shutdown in any 
 significant way, you have to reimplement a bunch of the logic there.
 the attached patch addresses a couple simple issues:
 1. add a 'start-foreground' option to zkServer.sh - this allows things that 
 expect to manage a foregrounded process (daemontools, launchd, etc) to use 
 zkServer.sh instead of rolling their own to launch zookeeper
 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper 
 from the script, just give me the command you'd normally use to exec 
 zookeeper.  I found this useful when writing automation to start/stop 
 zookeeper as part of smoke testing zookeeper-based applications
 3. Deal more gracefully with supplying alternate configuration files to 
 zookeeper - currently the script assumes all config files reside in 
 $ZOOCFGDIR - also useful for smoke testing
 4. communicate extra info (JMX enabled) about zookeeper on STDERR rather 
 than STDOUT (necessary for #2)
 5. fixes an issue on macos where readlink doesn't have the '-f' option.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing

2010-11-10 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930654#action_12930654
 ] 

Patrick Hunt commented on ZOOKEEPER-905:


Hi Nicholas, thanks!

You've currently got the workflow in inprogress mode, iirc this happens when 
you resume progress or something like that (we typically don't use that part 
of the workflow, if the issue is assigned to you the assumption is that you are 
working on it unless we hear otw). You'll need to take the jira out of 
inprogress mode and then select submit patch for this to go to the qabot 
and then get reviewed/comitted by a committer.

One other FYI, this jira is assigned to be fixed in 3.4.0 (current trunk, ie 
the next full trunk release). Typically you'd want to create the patch against 
svn trunk. Also, the patch queue on hudson (qa bot) will only test patches 
against trunk. Not a big deal (your patch may apply against trunk even if 
created from 3.3.1) but I just wanted to give you that headsup.

Thanks again. Regards.

 enhance zkServer.sh for easier zookeeper automation-izing
 -

 Key: ZOOKEEPER-905
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905
 Project: Zookeeper
  Issue Type: Improvement
  Components: scripts
Reporter: Nicholas Harteau
Assignee: Nicholas Harteau
Priority: Minor
 Fix For: 3.4.0

 Attachments: zkServer.sh.diff


 zkServer.sh is good at starting zookeeper and figuring out the right options 
 to pass along.
 unfortunately if you want to wrap zookeeper startup/shutdown in any 
 significant way, you have to reimplement a bunch of the logic there.
 the attached patch addresses a couple simple issues:
 1. add a 'start-foreground' option to zkServer.sh - this allows things that 
 expect to manage a foregrounded process (daemontools, launchd, etc) to use 
 zkServer.sh instead of rolling their own to launch zookeeper
 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper 
 from the script, just give me the command you'd normally use to exec 
 zookeeper.  I found this useful when writing automation to start/stop 
 zookeeper as part of smoke testing zookeeper-based applications
 3. Deal more gracefully with supplying alternate configuration files to 
 zookeeper - currently the script assumes all config files reside in 
 $ZOOCFGDIR - also useful for smoke testing
 4. communicate extra info (JMX enabled) about zookeeper on STDERR rather 
 than STDOUT (necessary for #2)
 5. fixes an issue on macos where readlink doesn't have the '-f' option.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work stopped: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing

2010-11-10 Thread Nicholas Harteau (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on ZOOKEEPER-905 stopped by Nicholas Harteau.

 enhance zkServer.sh for easier zookeeper automation-izing
 -

 Key: ZOOKEEPER-905
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905
 Project: Zookeeper
  Issue Type: Improvement
  Components: scripts
Reporter: Nicholas Harteau
Assignee: Nicholas Harteau
Priority: Minor
 Fix For: 3.4.0

 Attachments: zkServer.sh.diff


 zkServer.sh is good at starting zookeeper and figuring out the right options 
 to pass along.
 unfortunately if you want to wrap zookeeper startup/shutdown in any 
 significant way, you have to reimplement a bunch of the logic there.
 the attached patch addresses a couple simple issues:
 1. add a 'start-foreground' option to zkServer.sh - this allows things that 
 expect to manage a foregrounded process (daemontools, launchd, etc) to use 
 zkServer.sh instead of rolling their own to launch zookeeper
 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper 
 from the script, just give me the command you'd normally use to exec 
 zookeeper.  I found this useful when writing automation to start/stop 
 zookeeper as part of smoke testing zookeeper-based applications
 3. Deal more gracefully with supplying alternate configuration files to 
 zookeeper - currently the script assumes all config files reside in 
 $ZOOCFGDIR - also useful for smoke testing
 4. communicate extra info (JMX enabled) about zookeeper on STDERR rather 
 than STDOUT (necessary for #2)
 5. fixes an issue on macos where readlink doesn't have the '-f' option.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-909:


Hadoop Flags: [Reviewed]

+1 looks good thomas! thanx!

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-784) server-side functionality for read-only mode

2010-11-10 Thread Sergey Doroshenko (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930686#action_12930686
 ] 

Sergey Doroshenko commented on ZOOKEEPER-784:
-

Hi Patrick,

Currently I'm in the final stage of my Facebook's internship, and I want to 
keep good pace and not be distracted by other things.
Hopefully r-o mode is not release-blocking feature, so I hope it'll be ok if I 
resubmit the patch in 3 weeks. But if this should be done sooner, let me know 
and I'll try to find a time :)

 server-side functionality for read-only mode
 

 Key: ZOOKEEPER-784
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-784
 Project: Zookeeper
  Issue Type: Sub-task
  Components: server
Reporter: Sergey Doroshenko
Assignee: Sergey Doroshenko
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, 
 ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, 
 ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, 
 ZOOKEEPER-784.patch, ZOOKEEPER-784.patch


 As per http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode , create 
 ReadOnlyZooKeeperServer which comes into play when peer is partitioned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-784) server-side functionality for read-only mode

2010-11-10 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930692#action_12930692
 ] 

Patrick Hunt commented on ZOOKEEPER-784:


No worries, I'd like to get this in given you've done a bunch of work on it, 
qabot just flagged it given it's recently working again. thanks.

 server-side functionality for read-only mode
 

 Key: ZOOKEEPER-784
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-784
 Project: Zookeeper
  Issue Type: Sub-task
  Components: server
Reporter: Sergey Doroshenko
Assignee: Sergey Doroshenko
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, 
 ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, 
 ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, 
 ZOOKEEPER-784.patch, ZOOKEEPER-784.patch


 As per http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode , create 
 ReadOnlyZooKeeperServer which comes into play when peer is partitioned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] Release ZooKeeper 3.3.2 (candidate 0)

2010-11-10 Thread Mahadev Konar
+1 for the release.

Ran ant test and a couple of smoke tests. Create znodes and shutdown
zookeeper servers to test durability. Deleted znodes to make sure they are
deleted. Shot down servers one at a time to confirm correct behavior.

Thanks
mahadev


On 11/4/10 11:17 PM, Patrick Hunt ph...@apache.org wrote:

 I've created a candidate build for ZooKeeper 3.3.2. This is a bug fix
 release addressing twenty-six issues (eight critical) -- see the
 release notes for details.
 
 *** Please download, test and VOTE before the
 *** vote closes 11pm pacific time, Tuesday, November 9.***
 
 http://people.apache.org/~phunt/zookeeper-3.3.2-candidate-0/
 
 Should we release this?
 
 Patrick
 



[jira] Commented: (ZOOKEEPER-896) Improve C client to support dynamic authentication schemes

2010-11-10 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930720#action_12930720
 ] 

Mahadev konar commented on ZOOKEEPER-896:
-

this is interesting. Botond, can you explain you kerberos setup? Who generates 
the kerberos tokens? I am very interested in plugging in kerberos with 
zookeeper.


 Improve C client to support dynamic authentication schemes
 --

 Key: ZOOKEEPER-896
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-896
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
Reporter: Botond Hejj
Assignee: Botond Hejj
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-896.patch


 When we started exploring zookeeper for our requirements we found the 
 authentication mechanism is not flexible enough.
 We want to use kerberos for authentication but using the current API we ran 
 into a few problems. The idea is that we get a kerberos token on the client 
 side and than send that token to the server with a kerberos scheme. A server 
 side authentication plugin can use that token to authenticate the client and 
 also use the token for authorization.
 We ran into two problems with this approach:
 1. A different kerberos token is needed for each different server that client 
 can connect to since kerberos uses mutual authentication. That means when the 
 client acquires this kerberos token it has to know which server it connects 
 to and generate the token according to that. The client currently can't 
 generate a token for a specific server. The token stored in the auth_info is 
 used for all the servers.
 2. The kerberos token might have an expiry time so if the client loses the 
 connection to the server and than it tries to reconnect it should acquire a 
 new token. That is not possible currently since the token is stored in 
 auth_info and reused for every connection.
 The problem can be solved if we allow the client to register a callback for 
 authentication instead a static token. This can be a callback with an 
 argument which passes the current host string. The zookeeper client code 
 could call this callback before it sends the authentication info to the 
 server to get a fresh server specific token.
 This would solve our problem with the kerberos authentication and also could 
 be used for other more dynamic authentication schemes.
 The solution could be generalization also for the java client as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



What happens to a follower if leader hangs?

2010-11-10 Thread Vishal Kher
Hi,

In Follower.followLeader() after syncing with the leader, the follower does:
while (self.isRunning()) {
readPacket(qp);
processPacket(qp);
}

It looks like it relies on socket timeout expiry to figure out if the
connection with the leader has gone down.  So a follower *with no cilents*
may never notice a faulty leader if a Leader has a software hang, but the
TCP connections with the peers are still valid. Since it has not cilents, it
won't hearbeat with the Leader. If majority of followers are not connected
to any clients, then even if other followers attempt to elect a new leader
after detecting that the leader is unresponsive.

Please correct me if I am wrong. If I am not mistaken, should we add code at
the follower to monitor the heartbeat messages that it receives from the
leader and take action if it misses heartbeats for time  (syncLimit *
tickTime)? This certainly is a hypothetical case, however, I think it is
worth a fix.

Thanks.
-Vishal


Re: [VOTE] Release ZooKeeper 3.3.2 (candidate 0)

2010-11-10 Thread Patrick Hunt
+1 based on my testing. Included testing various cluster sizes. 9
server cluster I manually verified that killing/restarting servers
worked properly, including going below majority count and then
restarting some servers. Clients maintained connectivity and recovered
sessions correctly.

LGTM.

Patrick

On Wed, Nov 10, 2010 at 11:09 AM, Mahadev Konar maha...@yahoo-inc.com wrote:
 +1 for the release.

 Ran ant test and a couple of smoke tests. Create znodes and shutdown
 zookeeper servers to test durability. Deleted znodes to make sure they are
 deleted. Shot down servers one at a time to confirm correct behavior.

 Thanks
 mahadev


 On 11/4/10 11:17 PM, Patrick Hunt ph...@apache.org wrote:

 I've created a candidate build for ZooKeeper 3.3.2. This is a bug fix
 release addressing twenty-six issues (eight critical) -- see the
 release notes for details.

 *** Please download, test and VOTE before the
 *** vote closes 11pm pacific time, Tuesday, November 9.***

 http://people.apache.org/~phunt/zookeeper-3.3.2-candidate-0/

 Should we release this?

 Patrick





Re: What happens to a follower if leader hangs?

2010-11-10 Thread Mahadev Konar
Hi Vishal,
 There are periodic pings sent from the leader to the followers.

Take a look at Leader.java:

syncedSet.add(self.getId());
synchronized (learners) {
for (LearnerHandler f : learners) {
if (f.synced()) {
syncedCount++;
syncedSet.add(f.getSid());
}
f.ping();
}
}


This code sends periodic pings to the followers to make sure they are
running fine. We should keep track of these pings and see if we havent seen
a ping packet from the leader for a long time and give up following the
leader in case we havent heard from him for a long time. This is definitely
worth fixing since we pride ourselves in being a highly available and
reliable service.

Please feel free to open a jira and work on it.
3.4 would be a good target for this.

Thanks
mahadev

On 11/10/10 12:26 PM, Vishal Kher vishalm...@gmail.com wrote:

 Hi,
 
 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 
 It looks like it relies on socket timeout expiry to figure out if the
 connection with the leader has gone down.  So a follower *with no cilents*
 may never notice a faulty leader if a Leader has a software hang, but the
 TCP connections with the peers are still valid. Since it has not cilents, it
 won't hearbeat with the Leader. If majority of followers are not connected
 to any clients, then even if other followers attempt to elect a new leader
 after detecting that the leader is unresponsive.
 
 Please correct me if I am wrong. If I am not mistaken, should we add code at
 the follower to monitor the heartbeat messages that it receives from the
 leader and take action if it misses heartbeats for time  (syncLimit *
 tickTime)? This certainly is a hypothetical case, however, I think it is
 worth a fix.
 
 Thanks.
 -Vishal
 



Re: ZK's hudson patch queue being worked on

2010-11-10 Thread Patrick Hunt
The ZK qabot which tests the JIRA patch queue on hudson is again
operational. I have seen a few false failures in the c client
binding tests, Nigel is going to look into it.

You can see the results of tests on the JIRA (qabot appends as a
comment), also details/list here:
https://hudson.apache.org/hudson/view/S-Z/view/ZooKeeper/job/PreCommit-ZOOKEEPER-Build/

If you login to hudson you can actually test a particular JIRA
directly by clicking build now and entering the JIRA number into the
input box. Otw (if you do not have a login) you can cancel the patch
in JIRA and re-submit. Either will trigger the patch testing workflow.
Note, the most recent attachment to the JIRA will be tested.

Patrick

On Tue, Nov 9, 2010 at 10:43 AM, Patrick Hunt ph...@apache.org wrote:
 Nigel and Giri are working on fixing ZK's patch queue on hudson. You
 may see some spurious messages for a bit as we get things working
 again.

 This is the hudson process that tests new JIRA patches. When you
 attach a patch to a JIRA and submit it hudson will automatically
 verify the patch and comment on the jira. Here's an example:
 https://issues.apache.org/jira/browse/ZOOKEEPER-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930211#action_12930211

 Once hadoopqa bot has given it's blessing (typically +1 overall but
 there are some exceptions) committers generally will start their
 detailed review for commit.

 Given this workflow/automation important that you follow the details
 on our how to contribute page, esp in regards to creating the patch:
 http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute

 If you have any questions please do let us know.

 Regards,

 Patrick



Re: What happens to a follower if leader hangs?

2010-11-10 Thread Vishal Kher
Yes, thats what I was planning to do. At the follower, start FLE if the
follower does not receive a ping for  (syncLimit * tickTime).


On Wed, Nov 10, 2010 at 2:48 PM, Mahadev Konar maha...@yahoo-inc.comwrote:

 Hi Vishal,
  There are periodic pings sent from the leader to the followers.

 Take a look at Leader.java:

 syncedSet.add(self.getId());
synchronized (learners) {
for (LearnerHandler f : learners) {
if (f.synced()) {
syncedCount++;
syncedSet.add(f.getSid());
}
f.ping();
}
}


 This code sends periodic pings to the followers to make sure they are
 running fine. We should keep track of these pings and see if we havent seen
 a ping packet from the leader for a long time and give up following the
 leader in case we havent heard from him for a long time. This is definitely
 worth fixing since we pride ourselves in being a highly available and
 reliable service.

 Please feel free to open a jira and work on it.
 3.4 would be a good target for this.

 Thanks
 mahadev

 On 11/10/10 12:26 PM, Vishal Kher vishalm...@gmail.com wrote:

  Hi,
 
  In Follower.followLeader() after syncing with the leader, the follower
 does:
  while (self.isRunning()) {
  readPacket(qp);
  processPacket(qp);
  }
 
  It looks like it relies on socket timeout expiry to figure out if the
  connection with the leader has gone down.  So a follower *with no
 cilents*
  may never notice a faulty leader if a Leader has a software hang, but the
  TCP connections with the peers are still valid. Since it has not cilents,
 it
  won't hearbeat with the Leader. If majority of followers are not
 connected
  to any clients, then even if other followers attempt to elect a new
 leader
  after detecting that the leader is unresponsive.
 
  Please correct me if I am wrong. If I am not mistaken, should we add code
 at
  the follower to monitor the heartbeat messages that it receives from the
  leader and take action if it misses heartbeats for time  (syncLimit *
  tickTime)? This certainly is a hypothetical case, however, I think it is
  worth a fix.
 
  Thanks.
  -Vishal
 




[jira] Created: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

2010-11-10 Thread Vishal K (JIRA)
Follower should stop following and start FLE if it does not receive pings from 
the leader
-

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.2
Reporter: Vishal K
 Fix For: 3.4.0


In Follower.followLeader() after syncing with the leader, the follower does:
while (self.isRunning()) {
readPacket(qp);
processPacket(qp);
}

It looks like it relies on socket timeout expiry to figure out if the 
connection with the leader has gone down.  So a follower *with no cilents* may 
never notice a faulty leader if a Leader has a software hang, but the TCP 
connections with the peers are still valid. Since it has no cilents, it won't 
hearbeat with the Leader. If majority of followers are not connected to any 
clients, then FLE will fail even if other followers attempt to elect a new 
leader.

We should keep track of pings received from the leader and see if we havent seen
a ping packet from the leader for (syncLimit * tickTime) time and give up 
following the
leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: What happens to a follower if leader hangs?

2010-11-10 Thread Patrick Hunt
I'd go 3.3.3 and 3.4.0. Any of this (incl the other issues
Vishal/others have been finding recently) point to some particular set
of testing we might add to find problems like this? What are we
missing?

Once 3.3.2 is out and immediate tlp issues are addressed I'm going to
start pushing for 3.4 regardless of whether everything is in yet or
not.

Patrick

On Wed, Nov 10, 2010 at 11:48 AM, Mahadev Konar maha...@yahoo-inc.com wrote:
 Hi Vishal,
  There are periodic pings sent from the leader to the followers.

 Take a look at Leader.java:

 syncedSet.add(self.getId());
                synchronized (learners) {
                    for (LearnerHandler f : learners) {
                        if (f.synced()) {
                            syncedCount++;
                            syncedSet.add(f.getSid());
                        }
                        f.ping();
                    }
                }


 This code sends periodic pings to the followers to make sure they are
 running fine. We should keep track of these pings and see if we havent seen
 a ping packet from the leader for a long time and give up following the
 leader in case we havent heard from him for a long time. This is definitely
 worth fixing since we pride ourselves in being a highly available and
 reliable service.

 Please feel free to open a jira and work on it.
 3.4 would be a good target for this.

 Thanks
 mahadev

 On 11/10/10 12:26 PM, Vishal Kher vishalm...@gmail.com wrote:

 Hi,

 In Follower.followLeader() after syncing with the leader, the follower does:
                 while (self.isRunning()) {
                     readPacket(qp);
                     processPacket(qp);
                 }

 It looks like it relies on socket timeout expiry to figure out if the
 connection with the leader has gone down.  So a follower *with no cilents*
 may never notice a faulty leader if a Leader has a software hang, but the
 TCP connections with the peers are still valid. Since it has not cilents, it
 won't hearbeat with the Leader. If majority of followers are not connected
 to any clients, then even if other followers attempt to elect a new leader
 after detecting that the leader is unresponsive.

 Please correct me if I am wrong. If I am not mistaken, should we add code at
 the follower to monitor the heartbeat messages that it receives from the
 leader and take action if it misses heartbeats for time  (syncLimit *
 tickTime)? This certainly is a hypothetical case, however, I think it is
 worth a fix.

 Thanks.
 -Vishal





[jira] Updated: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

2010-11-10 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-928:
---

  Component/s: server
   quorum
 Priority: Critical  (was: Major)
Fix Version/s: 3.3.3

 Follower should stop following and start FLE if it does not receive pings 
 from the leader
 -

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical
 Fix For: 3.3.3, 3.4.0


 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 It looks like it relies on socket timeout expiry to figure out if the 
 connection with the leader has gone down.  So a follower *with no cilents* 
 may never notice a faulty leader if a Leader has a software hang, but the TCP 
 connections with the peers are still valid. Since it has no cilents, it won't 
 hearbeat with the Leader. If majority of followers are not connected to any 
 clients, then FLE will fail even if other followers attempt to elect a new 
 leader.
 We should keep track of pings received from the leader and see if we havent 
 seen
 a ping packet from the leader for (syncLimit * tickTime) time and give up 
 following the
 leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

2010-11-10 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930774#action_12930774
 ] 

Flavio Junqueira commented on ZOOKEEPER-928:


I've just seen the messages on zookeeper-dev, and I'm not sure this is right:

# readPacket is implemented in Learner.java, and the socket read is performed 
in this line: leaderIs.readRecord(pp, packet);
# leaderIs is an InputArchive instance instantiated in Learner:connectToLeader;
# The socket used to instantiate leaderIs has its SO_TIMEOUT value set right 
before in connectToLeader: sock.setSoTimeout(self.tickTime * self.initLimit).

Consequently, the operation should not be delayed indefinitely and should 
return after self.tickTime * self.initLimit. This discussion on SO_TIMEOUT 
sounds familiar, huh? ;-)

 Follower should stop following and start FLE if it does not receive pings 
 from the leader
 -

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical
 Fix For: 3.3.3, 3.4.0


 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 It looks like it relies on socket timeout expiry to figure out if the 
 connection with the leader has gone down.  So a follower *with no cilents* 
 may never notice a faulty leader if a Leader has a software hang, but the TCP 
 connections with the peers are still valid. Since it has no cilents, it won't 
 hearbeat with the Leader. If majority of followers are not connected to any 
 clients, then FLE will fail even if other followers attempt to elect a new 
 leader.
 We should keep track of pings received from the leader and see if we havent 
 seen
 a ping packet from the leader for (syncLimit * tickTime) time and give up 
 following the
 leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

2010-11-10 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930779#action_12930779
 ] 

Mahadev konar commented on ZOOKEEPER-928:
-

good point Flavio! I totally forgot about that. That should prevent this 
failure case. Vishal your thoughts?


 Follower should stop following and start FLE if it does not receive pings 
 from the leader
 -

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical
 Fix For: 3.3.3, 3.4.0


 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 It looks like it relies on socket timeout expiry to figure out if the 
 connection with the leader has gone down.  So a follower *with no cilents* 
 may never notice a faulty leader if a Leader has a software hang, but the TCP 
 connections with the peers are still valid. Since it has no cilents, it won't 
 hearbeat with the Leader. If majority of followers are not connected to any 
 clients, then FLE will fail even if other followers attempt to elect a new 
 leader.
 We should keep track of pings received from the leader and see if we havent 
 seen
 a ping packet from the leader for (syncLimit * tickTime) time and give up 
 following the
 leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

2010-11-10 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930780#action_12930780
 ] 

Vishal K commented on ZOOKEEPER-928:


Hi Flavio,

I was aware of that. However, this is not the case of idefinite TCP IO hang. If 
the leader hangs (e.g., software deadlock in ZooKeeper) its TCP connection will 
remain active. The follower will not see a socket timeout. Now, how can the 
follower determine if the leader is down?

 Follower should stop following and start FLE if it does not receive pings 
 from the leader
 -

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical
 Fix For: 3.3.3, 3.4.0


 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 It looks like it relies on socket timeout expiry to figure out if the 
 connection with the leader has gone down.  So a follower *with no cilents* 
 may never notice a faulty leader if a Leader has a software hang, but the TCP 
 connections with the peers are still valid. Since it has no cilents, it won't 
 hearbeat with the Leader. If majority of followers are not connected to any 
 clients, then FLE will fail even if other followers attempt to elect a new 
 leader.
 We should keep track of pings received from the leader and see if we havent 
 seen
 a ping packet from the leader for (syncLimit * tickTime) time and give up 
 following the
 leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

2010-11-10 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930785#action_12930785
 ] 

Mahadev konar commented on ZOOKEEPER-928:
-

vishal,
 Here is the definition of setSoTimeout -

{code}
public void setSoTimeout(int timeout)
  throws SocketException
Enable/disable SO_TIMEOUT with the specified timeout, in milliseconds. With 
this option set to a non-zero timeout, a read() call on the InputStream 
associated with this Socket will block for only this amount of time. If the 
timeout expires, a java.net.SocketTimeoutException is raised, though the Socket 
is still valid. The option must be enabled prior to entering the blocking 
operation to have effect. The timeout must be  0. A timeout of zero is 
interpreted as an infinite timeout.
{code}

This means is that the read would block till timeout and throw an exception if 
it doesnt hear from the leader during that time. Wouldnt this suffice?

 Follower should stop following and start FLE if it does not receive pings 
 from the leader
 -

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical
 Fix For: 3.3.3, 3.4.0


 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 It looks like it relies on socket timeout expiry to figure out if the 
 connection with the leader has gone down.  So a follower *with no cilents* 
 may never notice a faulty leader if a Leader has a software hang, but the TCP 
 connections with the peers are still valid. Since it has no cilents, it won't 
 hearbeat with the Leader. If majority of followers are not connected to any 
 clients, then FLE will fail even if other followers attempt to elect a new 
 leader.
 We should keep track of pings received from the leader and see if we havent 
 seen
 a ping packet from the leader for (syncLimit * tickTime) time and give up 
 following the
 leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

2010-11-10 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930786#action_12930786
 ] 

Vishal K commented on ZOOKEEPER-928:


ok, I see your point. I mis-analyzed this part of the code. I will wait for 
Flavio to comment and then close the jira.

 Follower should stop following and start FLE if it does not receive pings 
 from the leader
 -

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical
 Fix For: 3.3.3, 3.4.0


 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 It looks like it relies on socket timeout expiry to figure out if the 
 connection with the leader has gone down.  So a follower *with no cilents* 
 may never notice a faulty leader if a Leader has a software hang, but the TCP 
 connections with the peers are still valid. Since it has no cilents, it won't 
 hearbeat with the Leader. If majority of followers are not connected to any 
 clients, then FLE will fail even if other followers attempt to elect a new 
 leader.
 We should keep track of pings received from the leader and see if we havent 
 seen
 a ping packet from the leader for (syncLimit * tickTime) time and give up 
 following the
 leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

2010-11-10 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930788#action_12930788
 ] 

Flavio Junqueira commented on ZOOKEEPER-928:


Hi Vishal, My understanding is that the readRecord call in readPacket will 
timeout, even if the TCP connection is still up. The documentation in: 
http://download.oracle.com/javase/6/docs/api/java/net/SocketOptions.html

says that:
{noformat}
static int  SO_TIMEOUT
  Set a timeout on blocking Socket operations:
{noformat}

 Follower should stop following and start FLE if it does not receive pings 
 from the leader
 -

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical
 Fix For: 3.3.3, 3.4.0


 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 It looks like it relies on socket timeout expiry to figure out if the 
 connection with the leader has gone down.  So a follower *with no cilents* 
 may never notice a faulty leader if a Leader has a software hang, but the TCP 
 connections with the peers are still valid. Since it has no cilents, it won't 
 hearbeat with the Leader. If majority of followers are not connected to any 
 clients, then FLE will fail even if other followers attempt to elect a new 
 leader.
 We should keep track of pings received from the leader and see if we havent 
 seen
 a ping packet from the leader for (syncLimit * tickTime) time and give up 
 following the
 leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

2010-11-10 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930789#action_12930789
 ] 

Vishal K commented on ZOOKEEPER-928:


sorry for the false alarm. I got confused since SocketChannel is used in 
quorumCnxManager but this part of the code uses Socket and InputArchive.

 Follower should stop following and start FLE if it does not receive pings 
 from the leader
 -

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical
 Fix For: 3.3.3, 3.4.0


 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 It looks like it relies on socket timeout expiry to figure out if the 
 connection with the leader has gone down.  So a follower *with no cilents* 
 may never notice a faulty leader if a Leader has a software hang, but the TCP 
 connections with the peers are still valid. Since it has no cilents, it won't 
 hearbeat with the Leader. If majority of followers are not connected to any 
 clients, then FLE will fail even if other followers attempt to elect a new 
 leader.
 We should keep track of pings received from the leader and see if we havent 
 seen
 a ping packet from the leader for (syncLimit * tickTime) time and give up 
 following the
 leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

2010-11-10 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar resolved ZOOKEEPER-928.
-

   Resolution: Won't Fix
Fix Version/s: (was: 3.3.3)
   (was: 3.4.0)

No worries Vishal. Resolving the issue as wont fix. 

 Follower should stop following and start FLE if it does not receive pings 
 from the leader
 -

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical

 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 It looks like it relies on socket timeout expiry to figure out if the 
 connection with the leader has gone down.  So a follower *with no cilents* 
 may never notice a faulty leader if a Leader has a software hang, but the TCP 
 connections with the peers are still valid. Since it has no cilents, it won't 
 hearbeat with the Leader. If majority of followers are not connected to any 
 clients, then FLE will fail even if other followers attempt to elect a new 
 leader.
 We should keep track of pings received from the leader and see if we havent 
 seen
 a ping packet from the leader for (syncLimit * tickTime) time and give up 
 following the
 leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

2010-11-10 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930800#action_12930800
 ] 

Flavio Junqueira commented on ZOOKEEPER-928:


My understanding is that SO_TIMEOUT also affects SocketChannel, since it builds 
on top of a Socket object.

 Follower should stop following and start FLE if it does not receive pings 
 from the leader
 -

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical

 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 It looks like it relies on socket timeout expiry to figure out if the 
 connection with the leader has gone down.  So a follower *with no cilents* 
 may never notice a faulty leader if a Leader has a software hang, but the TCP 
 connections with the peers are still valid. Since it has no cilents, it won't 
 hearbeat with the Leader. If majority of followers are not connected to any 
 clients, then FLE will fail even if other followers attempt to elect a new 
 leader.
 We should keep track of pings received from the leader and see if we havent 
 seen
 a ping packet from the leader for (syncLimit * tickTime) time and give up 
 following the
 leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

2010-11-10 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930806#action_12930806
 ] 

Vishal K commented on ZOOKEEPER-928:


Hi Flavio,

Can you please try it with SocketChannel and confirm?

 Follower should stop following and start FLE if it does not receive pings 
 from the leader
 -

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical

 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 It looks like it relies on socket timeout expiry to figure out if the 
 connection with the leader has gone down.  So a follower *with no cilents* 
 may never notice a faulty leader if a Leader has a software hang, but the TCP 
 connections with the peers are still valid. Since it has no cilents, it won't 
 hearbeat with the Leader. If majority of followers are not connected to any 
 clients, then FLE will fail even if other followers attempt to elect a new 
 leader.
 We should keep track of pings received from the leader and see if we havent 
 seen
 a ping packet from the leader for (syncLimit * tickTime) time and give up 
 following the
 leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-909:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

committed to trunk.

Thanks for following through on this Thomas! Look forward to seeing the rest of 
it. Regards.

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] Release ZooKeeper 3.3.2 (candidate 0)

2010-11-10 Thread Michi Mutsuzaki
+1.

I ran my benchmark test on the release candidate for one hour, and got
similar numbers as 3.3.0.

--Michi

On 11/10/10 11:09 AM, Mahadev Konar maha...@yahoo-inc.com wrote:

 +1 for the release.
 
 Ran ant test and a couple of smoke tests. Create znodes and shutdown
 zookeeper servers to test durability. Deleted znodes to make sure they are
 deleted. Shot down servers one at a time to confirm correct behavior.
 
 Thanks
 mahadev
 
 
 On 11/4/10 11:17 PM, Patrick Hunt ph...@apache.org wrote:
 
 I've created a candidate build for ZooKeeper 3.3.2. This is a bug fix
 release addressing twenty-six issues (eight critical) -- see the
 release notes for details.
 
 *** Please download, test and VOTE before the
 *** vote closes 11pm pacific time, Tuesday, November 9.***
 
 http://people.apache.org/~phunt/zookeeper-3.3.2-candidate-0/
 
 Should we release this?
 
 Patrick
 
 
 



[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930821#action_12930821
 ] 

Hudson commented on ZOOKEEPER-909:
--

Integrated in ZooKeeper-trunk #997 (See 
[https://hudson.apache.org/hudson/job/ZooKeeper-trunk/997/])
ZOOKEEPER-909. Extract NIO specific code from ClientCnxn


 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-10 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-922:


Status: Open  (was: Patch Available)

the problem with your corner case is that you can end up with a leader who 
thinks it is still the leader, but zookeeper thinks the leader is dead and 
allows another leader to take over.

there may be a way to do this reliably, but we need to vet the design first.

 enable faster timeout of sessions in case of unexpected socket disconnect
 -

 Key: ZOOKEEPER-922
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Camille Fournier
Assignee: Camille Fournier
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-922.patch


 In the case when a client connection is closed due to socket error instead of 
 the client calling close explicitly, it would be nice to enable the session 
 associated with that client to time out faster than the negotiated session 
 timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
 discovery provider to remove ephemeral nodes for crashed clients quickly, 
 while allowing for a longer heartbeat-based timeout for java clients that 
 need to do long stop-the-world GC. 
 I propose doing this by setting the timeout associated with the crashed 
 session to minSessionTimeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: What happens to a follower if leader hangs?

2010-11-10 Thread Benjamin Reed
have you been able to make this happen? the behavior you are suggesting 
is exactly what should be happening. When we sync with the leader we set 
the socket timeout: sock.setSoTimeout(self.tickTime * self.syncLimit);


if the leader hangs, we should get a timeout and disconnect from the leader.

ben


On 11/10/2010 11:57 AM, Vishal Kher wrote:

Yes, thats what I was planning to do. At the follower, start FLE if the
follower does not receive a ping for  (syncLimit * tickTime).


On Wed, Nov 10, 2010 at 2:48 PM, Mahadev Konarmaha...@yahoo-inc.comwrote:


Hi Vishal,
  There are periodic pings sent from the leader to the followers.

Take a look at Leader.java:

syncedSet.add(self.getId());
synchronized (learners) {
for (LearnerHandler f : learners) {
if (f.synced()) {
syncedCount++;
syncedSet.add(f.getSid());
}
f.ping();
}
}


This code sends periodic pings to the followers to make sure they are
running fine. We should keep track of these pings and see if we havent seen
a ping packet from the leader for a long time and give up following the
leader in case we havent heard from him for a long time. This is definitely
worth fixing since we pride ourselves in being a highly available and
reliable service.

Please feel free to open a jira and work on it.
3.4 would be a good target for this.

Thanks
mahadev

On 11/10/10 12:26 PM, Vishal Khervishalm...@gmail.com  wrote:


Hi,

In Follower.followLeader() after syncing with the leader, the follower

does:

 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }

It looks like it relies on socket timeout expiry to figure out if the
connection with the leader has gone down.  So a follower *with no

cilents*

may never notice a faulty leader if a Leader has a software hang, but the
TCP connections with the peers are still valid. Since it has not cilents,

it

won't hearbeat with the Leader. If majority of followers are not

connected

to any clients, then even if other followers attempt to elect a new

leader

after detecting that the leader is unresponsive.

Please correct me if I am wrong. If I am not mistaken, should we add code

at

the follower to monitor the heartbeat messages that it receives from the
leader and take action if it misses heartbeats for time  (syncLimit *
tickTime)? This certainly is a hypothetical case, however, I think it is
worth a fix.

Thanks.
-Vishal







[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

2010-11-10 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930851#action_12930851
 ] 

Flavio Junqueira commented on ZOOKEEPER-928:


The documentation refers to SocketInputStream.read(), but it doesn't mention 
SocketChannel.read(). I ran a quick test with QuorumCnxManager and it doesn't 
seem to work. So maybe it is true that setting SO_TIMEOUT has no effect on 
SocketChannel.read(), which is kind of surprising to me. 

 Follower should stop following and start FLE if it does not receive pings 
 from the leader
 -

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical

 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 It looks like it relies on socket timeout expiry to figure out if the 
 connection with the leader has gone down.  So a follower *with no cilents* 
 may never notice a faulty leader if a Leader has a software hang, but the TCP 
 connections with the peers are still valid. Since it has no cilents, it won't 
 hearbeat with the Leader. If majority of followers are not connected to any 
 clients, then FLE will fail even if other followers attempt to elect a new 
 leader.
 We should keep track of pings received from the leader and see if we havent 
 seen
 a ping packet from the leader for (syncLimit * tickTime) time and give up 
 following the
 leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] Release ZooKeeper 3.3.2 (candidate 0)

2010-11-10 Thread Flavio Junqueira
+1, unit tests pass. Also ran a few manual tests. I must say that in one of the computers I tried, AsyncHammerTest fails, and the error message I get is that there are no tests. Discussing with Pat, we ended up concluding that it is most likely a configuration problem. I don't think that's a reason to -1 it, though.-FlavioOn Nov 11, 2010, at 12:24 AM, Henry Robinson wrote:+1Python looks good.On 10 November 2010 14:51, Michi Mutsuzaki mic...@yahoo-inc.com wrote:+1.I ran my benchmark test on the release candidate for one hour, and gotsimilar numbers as 3.3.0.--MichiOn 11/10/10 11:09 AM, "Mahadev Konar" maha...@yahoo-inc.com wrote:+1 for the release.Ran ant test and a couple of smoke tests. Create znodes and shutdownzookeeper servers to test durability. Deleted znodes to make sure theyaredeleted. Shot down servers one at a time to confirm correct behavior.ThanksmahadevOn 11/4/10 11:17 PM, "Patrick Hunt" ph...@apache.org wrote:I've created a candidate build for ZooKeeper 3.3.2. This is a bug fixrelease addressing twenty-six issues (eight critical) -- see therelease notes for details.*** Please download, test and VOTE before the*** vote closes 11pm pacific time, Tuesday, November 9.***http://people.apache.org/~phunt/zookeeper-3.3.2-candidate-0/Should we release this?Patrick-- Henry RobinsonSoftware EngineerCloudera415-994-6679 flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301 

Re: [VOTE] Release ZooKeeper 3.3.2 (candidate 0)

2010-11-10 Thread Stack
+1

I put it up on a cluster under hbase and ran loads against it over
last few hours.  Nothing untoward in logs.  Played around w/ zkcli.
It seems to behaving same as 3.3.1.

St.Ack


On Wed, Nov 10, 2010 at 3:24 PM, Henry Robinson he...@cloudera.com wrote:
 +1

 Python looks good.

 On 10 November 2010 14:51, Michi Mutsuzaki mic...@yahoo-inc.com wrote:

 +1.

 I ran my benchmark test on the release candidate for one hour, and got
 similar numbers as 3.3.0.

 --Michi

 On 11/10/10 11:09 AM, Mahadev Konar maha...@yahoo-inc.com wrote:

  +1 for the release.
 
  Ran ant test and a couple of smoke tests. Create znodes and shutdown
  zookeeper servers to test durability. Deleted znodes to make sure they
 are
  deleted. Shot down servers one at a time to confirm correct behavior.
 
  Thanks
  mahadev
 
 
  On 11/4/10 11:17 PM, Patrick Hunt ph...@apache.org wrote:
 
  I've created a candidate build for ZooKeeper 3.3.2. This is a bug fix
  release addressing twenty-six issues (eight critical) -- see the
  release notes for details.
 
  *** Please download, test and VOTE before the
  *** vote closes 11pm pacific time, Tuesday, November 9.***
 
  http://people.apache.org/~phunt/zookeeper-3.3.2-candidate-0/
 
  Should we release this?
 
  Patrick
 
 
 




 --
 Henry Robinson
 Software Engineer
 Cloudera
 415-994-6679



[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

2010-11-10 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930905#action_12930905
 ] 

Patrick Hunt commented on ZOOKEEPER-928:


according to this it's not a bug:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4614802

specifically:

The read methods in SocketChannel (and DatagramChannel) do not
support timeouts.  If you need the timeout functionality then use the read
methods of the associated Socket (or DatagramSocket) object.

notice this was asked/answered a while ago though, however I suspect it's still 
true.

 Follower should stop following and start FLE if it does not receive pings 
 from the leader
 -

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical

 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 It looks like it relies on socket timeout expiry to figure out if the 
 connection with the leader has gone down.  So a follower *with no cilents* 
 may never notice a faulty leader if a Leader has a software hang, but the TCP 
 connections with the peers are still valid. Since it has no cilents, it won't 
 hearbeat with the Leader. If majority of followers are not connected to any 
 clients, then FLE will fail even if other followers attempt to elect a new 
 leader.
 We should keep track of pings received from the leader and see if we havent 
 seen
 a ping packet from the leader for (syncLimit * tickTime) time and give up 
 following the
 leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] Release ZooKeeper 3.3.2 (candidate 0)

2010-11-10 Thread Patrick Hunt
With 6 +1's (3 PMC) and no -1's the vote passes. I'm working to
publish the release and will send announcements as soon as that's
done.

Patrick

On Wed, Nov 10, 2010 at 4:15 PM, Stack st...@duboce.net wrote:
 +1

 I put it up on a cluster under hbase and ran loads against it over
 last few hours.  Nothing untoward in logs.  Played around w/ zkcli.
 It seems to behaving same as 3.3.1.

 St.Ack


 On Wed, Nov 10, 2010 at 3:24 PM, Henry Robinson he...@cloudera.com wrote:
 +1

 Python looks good.

 On 10 November 2010 14:51, Michi Mutsuzaki mic...@yahoo-inc.com wrote:

 +1.

 I ran my benchmark test on the release candidate for one hour, and got
 similar numbers as 3.3.0.

 --Michi

 On 11/10/10 11:09 AM, Mahadev Konar maha...@yahoo-inc.com wrote:

  +1 for the release.
 
  Ran ant test and a couple of smoke tests. Create znodes and shutdown
  zookeeper servers to test durability. Deleted znodes to make sure they
 are
  deleted. Shot down servers one at a time to confirm correct behavior.
 
  Thanks
  mahadev
 
 
  On 11/4/10 11:17 PM, Patrick Hunt ph...@apache.org wrote:
 
  I've created a candidate build for ZooKeeper 3.3.2. This is a bug fix
  release addressing twenty-six issues (eight critical) -- see the
  release notes for details.
 
  *** Please download, test and VOTE before the
  *** vote closes 11pm pacific time, Tuesday, November 9.***
 
  http://people.apache.org/~phunt/zookeeper-3.3.2-candidate-0/
 
  Should we release this?
 
  Patrick
 
 
 




 --
 Henry Robinson
 Software Engineer
 Cloudera
 415-994-6679