[jira] Commented: (ZOOKEEPER-816) Detecting and diagnosing elusive bugs and faults in Zookeeper

2010-07-19 Thread Ivan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889800#action_12889800
 ] 

Ivan Kelly commented on ZOOKEEPER-816:
--

(Assumed JIRA picked up email replies. Seems not :/)

As far as I've seen, this overhead comes in two forms, CPU and disk.  
CPU overhead is mostly due to formatting. Disk obviously because  
tracing will fill your disk fairly quickly. Perhaps something could be  
done to combat both of these. To fix the formatting problem we could  
use a binary log format. I've seen this done in C++ but not in java.  
The basic idea is that if you have TRACE(operation %x happened to %s  
%p, obj1, obj2, obj3); a preprocessor replaces this with  
TRACE(0x1234, obj1, obj2, obj3) where 0x1234 is an identifier for the  
trace. Then when the trace occurs a binary blob [0x1234, value of  
obj1, value of obj2, value of obj3] is logged. Then when the logs are  
pulled of the machine you run a post processor to do all the  
formatting and you get your full trace.

Regarding the disk overhead, traces are usually only interesting in  
the run up to a failure. We could have a ring buffer in memory that is  
constantly traced to, old traces being overwritten when the ring  
buffer reaches it's limit. These traces should only be dumped to the  
filesystem when an error or fatal level event occurs, thereby giving  
you a trace of what was happening before you fell over.



-Ivan


 Detecting and diagnosing elusive bugs and faults in Zookeeper
 -

 Key: ZOOKEEPER-816
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-816
 Project: Zookeeper
  Issue Type: New Feature
Reporter: Miguel Correia
Priority: Minor

 Complex distributed systems like Zookeeper tend to fail in strange ways that 
 are hard to diagnose. The objective is to build a tool that helps understand 
 when and where these problems occurred based on Zookeeper's traces (i.e., 
 logs in TRACE level). Minor changes to the server code will be needed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-816) Detecting and diagnosing elusive bugs and faults in Zookeeper

2010-07-19 Thread Ivan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889801#action_12889801
 ] 

Ivan Kelly commented on ZOOKEEPER-816:
--

Re: Aspects, Miguel take a look at AspectJ/Aspect Oriented Programming. 
Basically allows you to hook code into preexisting code if I understand it 
correctly. Sort of like auxilary methods in CLOS.

 Detecting and diagnosing elusive bugs and faults in Zookeeper
 -

 Key: ZOOKEEPER-816
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-816
 Project: Zookeeper
  Issue Type: New Feature
Reporter: Miguel Correia
Priority: Minor

 Complex distributed systems like Zookeeper tend to fail in strange ways that 
 are hard to diagnose. The objective is to build a tool that helps understand 
 when and where these problems occurred based on Zookeeper's traces (i.e., 
 logs in TRACE level). Minor changes to the server code will be needed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-816) Detecting and diagnosing elusive bugs and faults in Zookeeper

2010-07-19 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889814#action_12889814
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-816:
--

You may consider having a look at ZOOKEEPER-512 for a reference within the 
context of ZooKeeper.

 Detecting and diagnosing elusive bugs and faults in Zookeeper
 -

 Key: ZOOKEEPER-816
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-816
 Project: Zookeeper
  Issue Type: New Feature
Reporter: Miguel Correia
Priority: Minor

 Complex distributed systems like Zookeeper tend to fail in strange ways that 
 are hard to diagnose. The objective is to build a tool that helps understand 
 when and where these problems occurred based on Zookeeper's traces (i.e., 
 logs in TRACE level). Minor changes to the server code will be needed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

2010-07-19 Thread Vishal K (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal K updated ZOOKEEPER-822:
---

Attachment: test_zookeeper_2.log
test_zookeeper_1.log

Attaching logs from two nodes that took too long to complete leader election. 

 Leader election taking a long time  to complete
 ---

 Key: ZOOKEEPER-822
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.0
Reporter: Vishal K
Priority: Blocker
 Attachments: test_zookeeper_1.log, test_zookeeper_2.log


 Created a 3 node cluster.
 1 Fail the ZK leader
 2. Let leader election finish. Restart the leader and let it join the 
 3. Repeat 
 After a few rounds leader election takes anywhere 25- 60 seconds to finish. 
 Note- we didn't have any ZK clients and no new znodes were created.
 zoo.cfg is shown below:
 #Mon Jul 19 12:15:10 UTC 2010
 server.1=192.168.4.12\:2888\:3888
 server.0=192.168.4.11\:2888\:3888
 clientPort=2181
 dataDir=/var/zookeeper
 syncLimit=2
 server.2=192.168.4.13\:2888\:3888
 initLimit=5
 tickTime=2000
 I have attached logs from two nodes that took a long time to form the cluster 
 after failing the leader. The leader was down anyways so logs from that node 
 shouldn't matter.
 Look for START HERE. Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-822) Leader election taking a long time to complete

2010-07-19 Thread Vishal K (JIRA)
Leader election taking a long time  to complete
---

 Key: ZOOKEEPER-822
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.0
Reporter: Vishal K
Priority: Blocker


Created a 3 node cluster.

1 Fail the ZK leader
2. Let leader election finish. Restart the leader and let it join the 
3. Repeat 

After a few rounds leader election takes anywhere 25- 60 seconds to finish. 
Note- we didn't have any ZK clients and no new znodes were created.

zoo.cfg is shown below:

#Mon Jul 19 12:15:10 UTC 2010
server.1=192.168.4.12\:2888\:3888
server.0=192.168.4.11\:2888\:3888
clientPort=2181
dataDir=/var/zookeeper
syncLimit=2
server.2=192.168.4.13\:2888\:3888
initLimit=5
tickTime=2000

I have attached logs from two nodes that took a long time to form the cluster 
after failing the leader. The leader was down anyways so logs from that node 
shouldn't matter.
Look for START HERE. Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

2010-07-19 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889902#action_12889902
 ] 

Vishal K commented on ZOOKEEPER-822:


I would like that add that the problem is highly reproducible.

 Leader election taking a long time  to complete
 ---

 Key: ZOOKEEPER-822
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.0
Reporter: Vishal K
Priority: Blocker
 Attachments: test_zookeeper_1.log, test_zookeeper_2.log


 Created a 3 node cluster.
 1 Fail the ZK leader
 2. Let leader election finish. Restart the leader and let it join the 
 3. Repeat 
 After a few rounds leader election takes anywhere 25- 60 seconds to finish. 
 Note- we didn't have any ZK clients and no new znodes were created.
 zoo.cfg is shown below:
 #Mon Jul 19 12:15:10 UTC 2010
 server.1=192.168.4.12\:2888\:3888
 server.0=192.168.4.11\:2888\:3888
 clientPort=2181
 dataDir=/var/zookeeper
 syncLimit=2
 server.2=192.168.4.13\:2888\:3888
 initLimit=5
 tickTime=2000
 I have attached logs from two nodes that took a long time to form the cluster 
 after failing the leader. The leader was down anyways so logs from that node 
 shouldn't matter.
 Look for START HERE. Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-816) Detecting and diagnosing elusive bugs and faults in Zookeeper

2010-07-19 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889904#action_12889904
 ] 

Patrick Hunt commented on ZOOKEEPER-816:


http://www.eclipse.org/aspectj/
http://en.wikipedia.org/wiki/AspectJ

I use this for network fault injection testing - instrumenting network 
operations with random failures. Works extremely well and I don't need to 
modify the original source at all.


 Detecting and diagnosing elusive bugs and faults in Zookeeper
 -

 Key: ZOOKEEPER-816
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-816
 Project: Zookeeper
  Issue Type: New Feature
Reporter: Miguel Correia
Priority: Minor

 Complex distributed systems like Zookeeper tend to fail in strange ways that 
 are hard to diagnose. The objective is to build a tool that helps understand 
 when and where these problems occurred based on Zookeeper's traces (i.e., 
 logs in TRACE level). Minor changes to the server code will be needed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

2010-07-19 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889906#action_12889906
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-822:
--

Vishal, I can't reproduce your problem. I just tried twice to kill the leader 
and rejoin it 20 times each, and I can't see the problem you're mentioning.  I 
wonder if there is anything special about your setup. I also can see in your 
logs lots of exceptions related to connections, and as a first cut, it sounds 
like this is preventing the severs from exchanging notifications, and therefore 
the delay. 

Two minor comments: your log file for server 2 does not contain START HERE 
and each file duplicates every message.



 Leader election taking a long time  to complete
 ---

 Key: ZOOKEEPER-822
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.0
Reporter: Vishal K
Priority: Blocker
 Attachments: test_zookeeper_1.log, test_zookeeper_2.log


 Created a 3 node cluster.
 1 Fail the ZK leader
 2. Let leader election finish. Restart the leader and let it join the 
 3. Repeat 
 After a few rounds leader election takes anywhere 25- 60 seconds to finish. 
 Note- we didn't have any ZK clients and no new znodes were created.
 zoo.cfg is shown below:
 #Mon Jul 19 12:15:10 UTC 2010
 server.1=192.168.4.12\:2888\:3888
 server.0=192.168.4.11\:2888\:3888
 clientPort=2181
 dataDir=/var/zookeeper
 syncLimit=2
 server.2=192.168.4.13\:2888\:3888
 initLimit=5
 tickTime=2000
 I have attached logs from two nodes that took a long time to form the cluster 
 after failing the leader. The leader was down anyways so logs from that node 
 shouldn't matter.
 Look for START HERE. Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

2010-07-19 Thread Ivan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889907#action_12889907
 ] 

Ivan Kelly commented on ZOOKEEPER-822:
--

Could you try putting the logs through loggraph (in zookeeper/src/contrib)? 
Perhaps a graphical view will give some insight?

 Leader election taking a long time  to complete
 ---

 Key: ZOOKEEPER-822
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.0
Reporter: Vishal K
Priority: Blocker
 Attachments: test_zookeeper_1.log, test_zookeeper_2.log


 Created a 3 node cluster.
 1 Fail the ZK leader
 2. Let leader election finish. Restart the leader and let it join the 
 3. Repeat 
 After a few rounds leader election takes anywhere 25- 60 seconds to finish. 
 Note- we didn't have any ZK clients and no new znodes were created.
 zoo.cfg is shown below:
 #Mon Jul 19 12:15:10 UTC 2010
 server.1=192.168.4.12\:2888\:3888
 server.0=192.168.4.11\:2888\:3888
 clientPort=2181
 dataDir=/var/zookeeper
 syncLimit=2
 server.2=192.168.4.13\:2888\:3888
 initLimit=5
 tickTime=2000
 I have attached logs from two nodes that took a long time to form the cluster 
 after failing the leader. The leader was down anyways so logs from that node 
 shouldn't matter.
 Look for START HERE. Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-733) use netty to handle client connections

2010-07-19 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889909#action_12889909
 ] 

Patrick Hunt commented on ZOOKEEPER-733:


while testing I did notice a problem - closed client connections are staying in 
time_wait state for a long period of time.

this test was done by starting the server, then connecting a java NIO based 
client using the shell, then quitting the shell using quit. Is this the shell 
or Netty?

In netty factory we are setting the linger time to 2 sec, however this doesn't 
seem to be used? According to netstat the socket is held in time_wait for much 
longer than 2 sec.


 use netty to handle client connections
 --

 Key: ZOOKEEPER-733
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-733
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Benjamin Reed
Assignee: Patrick Hunt
 Fix For: 3.4.0

 Attachments: accessive.jar, flowctl.zip, moved.zip, 
 QuorumTestFailed_sessionmoved_TRACE_LOG.txt.gz, ZOOKEEPER-733.patch, 
 ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, 
 ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, 
 ZOOKEEPER-733.patch


 we currently have our own asynchronous NIO socket engine to be able to handle 
 lots of clients with a single thread. over time the engine has become more 
 complicated. we would also like the engine to use multiple threads on 
 machines with lots of cores. plus, we would like to be able to support things 
 like SSL. if we switch to netty, we can simplify our code and get the 
 previously mentioned benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-19 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889940#action_12889940
 ] 

Henry Robinson commented on ZOOKEEPER-821:
--

Rich - 

This is a really useful contribution, thanks! The only thing I would change 
from your patch would be to use snprintf with a buffer length of 10 so as to 
avoid any potential string overflows if our version numbers ever get huge :)

Otherwise +1; if you make this change I'll commit asap. 

Thanks!
Henry

 Add ZooKeeper version information to zkpython
 -

 Key: ZOOKEEPER-821
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bindings
Affects Versions: 3.3.1
Reporter: Rich Schumacher
Assignee: Rich Schumacher
Priority: Trivial
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-821.patch


 Since installing and using ZooKeeper I've built and installed no less than 
 four versions of the zkpython bindings.  It would be really helpful if the 
 module had a '__version__' attribute to easily tell which version is 
 currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-19 Thread Rich Schumacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Schumacher updated ZOOKEEPER-821:
--

Attachment: ZOOKEEPER-821.patch

Use snprintf to avoid buffer overflows.

 Add ZooKeeper version information to zkpython
 -

 Key: ZOOKEEPER-821
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bindings
Affects Versions: 3.3.1
Reporter: Rich Schumacher
Assignee: Rich Schumacher
Priority: Trivial
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-821.patch


 Since installing and using ZooKeeper I've built and installed no less than 
 four versions of the zkpython bindings.  It would be really helpful if the 
 module had a '__version__' attribute to easily tell which version is 
 currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-19 Thread Rich Schumacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Schumacher updated ZOOKEEPER-821:
--

Attachment: (was: ZOOKEEPER-821.patch)

 Add ZooKeeper version information to zkpython
 -

 Key: ZOOKEEPER-821
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bindings
Affects Versions: 3.3.1
Reporter: Rich Schumacher
Assignee: Rich Schumacher
Priority: Trivial
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-821.patch


 Since installing and using ZooKeeper I've built and installed no less than 
 four versions of the zkpython bindings.  It would be really helpful if the 
 module had a '__version__' attribute to easily tell which version is 
 currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-19 Thread Rich Schumacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Schumacher updated ZOOKEEPER-821:
--

Status: Open  (was: Patch Available)

 Add ZooKeeper version information to zkpython
 -

 Key: ZOOKEEPER-821
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bindings
Affects Versions: 3.3.1
Reporter: Rich Schumacher
Assignee: Rich Schumacher
Priority: Trivial
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-821.patch


 Since installing and using ZooKeeper I've built and installed no less than 
 four versions of the zkpython bindings.  It would be really helpful if the 
 module had a '__version__' attribute to easily tell which version is 
 currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-19 Thread Rich Schumacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Schumacher updated ZOOKEEPER-821:
--

Status: Patch Available  (was: Open)

 Add ZooKeeper version information to zkpython
 -

 Key: ZOOKEEPER-821
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bindings
Affects Versions: 3.3.1
Reporter: Rich Schumacher
Assignee: Rich Schumacher
Priority: Trivial
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-821.patch


 Since installing and using ZooKeeper I've built and installed no less than 
 four versions of the zkpython bindings.  It would be really helpful if the 
 module had a '__version__' attribute to easily tell which version is 
 currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889994#action_12889994
 ] 

Hadoop QA commented on ZOOKEEPER-821:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449865/ZOOKEEPER-821.patch
  against trunk revision 963957.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/149/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/149/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/149/console

This message is automatically generated.

 Add ZooKeeper version information to zkpython
 -

 Key: ZOOKEEPER-821
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bindings
Affects Versions: 3.3.1
Reporter: Rich Schumacher
Assignee: Rich Schumacher
Priority: Trivial
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-821.patch


 Since installing and using ZooKeeper I've built and installed no less than 
 four versions of the zkpython bindings.  It would be really helpful if the 
 module had a '__version__' attribute to easily tell which version is 
 currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

2010-07-19 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890052#action_12890052
 ] 

Vishal K commented on ZOOKEEPER-822:


Hi Flavio,

I have Zookeeper servers running in a VM. To kill ZK server, I power off a VM. 
On the other hand, I tried several times killing ZK process and restarting it 
and I did not see any issues.
So there is something about the reboot that is causing this problem (TCP 
session not getting cleaned-up?).

I don't see many connection exceptions in the log.

Once the leader election starts  we start seeing Notification time out 
messages.

However, before this we do see that the connection was established (show below):

2010-07-19 14:40:52,562 - DEBUG [WorkerSender Thread:quorumcnxmana...@366] - 
There is a connection already for server 0
2010-07-19 14:40:52,563 - DEBUG [WorkerSender Thread:quorumcnxmana...@346] - 
Opening channel to server 2

Do you still think this is a communication problem?

 Leader election taking a long time  to complete
 ---

 Key: ZOOKEEPER-822
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.0
Reporter: Vishal K
Priority: Blocker
 Attachments: test_zookeeper_1.log, test_zookeeper_2.log


 Created a 3 node cluster.
 1 Fail the ZK leader
 2. Let leader election finish. Restart the leader and let it join the 
 3. Repeat 
 After a few rounds leader election takes anywhere 25- 60 seconds to finish. 
 Note- we didn't have any ZK clients and no new znodes were created.
 zoo.cfg is shown below:
 #Mon Jul 19 12:15:10 UTC 2010
 server.1=192.168.4.12\:2888\:3888
 server.0=192.168.4.11\:2888\:3888
 clientPort=2181
 dataDir=/var/zookeeper
 syncLimit=2
 server.2=192.168.4.13\:2888\:3888
 initLimit=5
 tickTime=2000
 I have attached logs from two nodes that took a long time to form the cluster 
 after failing the leader. The leader was down anyways so logs from that node 
 shouldn't matter.
 Look for START HERE. Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections

2010-07-19 Thread Patrick Hunt (JIRA)
update ZooKeeper java client to optionally use Netty for connections


 Key: ZOOKEEPER-823
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823
 Project: Zookeeper
  Issue Type: New Feature
  Components: java client
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Fix For: 3.4.0


This jira will port the client side connection code to use netty rather than 
direct nio.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-07-19 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

In this patch: 
* Learners report clients heartbeat sample information (mean and standard 
deviation) to Leaders. 

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-824) Calling addEntry on LedgerHandler obtained using openLedger does not throw and exception.

2010-07-19 Thread JIRA
Calling addEntry on LedgerHandler obtained using openLedger does not throw and 
exception.
-

 Key: ZOOKEEPER-824
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-824
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.1
Reporter: André Oriani


Calling addEntry on LedgerHandler obtained using openLedger does not throw and 
exception.

Teste Case :

{code}
package br.unicamp.zooexp.booexp;

import java.io.IOException;
import java.util.Enumeration;

import org.apache.bookkeeper.client.BKException;
import org.apache.bookkeeper.client.BookKeeper;
import org.apache.bookkeeper.client.LedgerEntry;
import org.apache.bookkeeper.client.LedgerHandle;
import org.apache.bookkeeper.client.BookKeeper.DigestType;
import org.apache.zookeeper.KeeperException;

public class BookTest {

public static void main (String ... args) throws IOException, 
InterruptedException, KeeperException, BKException{
BookKeeper bk = new BookKeeper(127.0.0.1);
LedgerHandle lh = bk.createLedger(DigestType.CRC32, 123.getBytes());
long lh_id = lh.getId();
lh.addEntry(Teste.getBytes());
lh.addEntry(Test2.getBytes());
System.out.printf(Got %d entries for lh\n,lh.getLastAddConfirmed()+1);



lh.addEntry(Test3.getBytes());
LedgerHandle lh1 = bk.openLedger(lh_id, DigestType.CRC32, 
123.getBytes());
System.out.printf(Got %d entries for 
lh1\n,lh1.getLastAddConfirmed()+1);
lh.addEntry(Test4.getBytes());

lh.addEntry(Test5.getBytes());
lh.addEntry(Test6.getBytes());
System.out.printf(Got %d entries for lh\n,lh.getLastAddConfirmed()+1);
EnumerationLedgerEntry seq = lh.readEntries(0, 
lh.getLastAddConfirmed());
while (seq.hasMoreElements()){
System.out.println(new String(seq.nextElement().getEntry()));
}
lh.close();


lh1.addEntry(Test7.getBytes());
lh1.addEntry(Test8.getBytes());

System.out.printf(Got %d entries for 
lh1\n,lh1.getLastAddConfirmed()+1);

seq = lh1.readEntries(0, lh1.getLastAddConfirmed());
while (seq.hasMoreElements()){
System.out.println(new String(seq.nextElement().getEntry()));
}


lh1.close();

LedgerHandle lh2 = bk.openLedger(lh_id, DigestType.CRC32, 
123.getBytes());
lh2.addEntry(Test9.getBytes());

System.out.printf(Got %d entries for lh2 
\n,lh2.getLastAddConfirmed()+1);

seq = lh2.readEntries(0, lh2.getLastAddConfirmed());
while (seq.hasMoreElements()){
System.out.println(new String(seq.nextElement().getEntry()));
}

bk.halt();

}
}

{code}


{panel:title=Output}
Got 2 entries for lh
Got 3 entries for lh1
Got 6 entries for lh
Teste
Test2
Test3
Test4
Test5
Test6
Got 3 entries for lh1
Teste
Test2
Test3
Got 3 entries for lh2 
Teste
Test2
Test3
{panel}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-824) Calling addEntry on a LedgerHandler obtained using openLedger does not throw an exception.

2010-07-19 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

André Oriani updated ZOOKEEPER-824:
---

Summary: Calling addEntry on a LedgerHandler obtained using openLedger 
does not throw an exception.  (was: Calling addEntry on LedgerHandler obtained 
using openLedger does not throw and exception.)
Description: 
Calling addEntry on a LedgerHandler obtained using openLedger does not throw an 
exception.

Teste Case :

{code}
package br.unicamp.zooexp.booexp;

import java.io.IOException;
import java.util.Enumeration;

import org.apache.bookkeeper.client.BKException;
import org.apache.bookkeeper.client.BookKeeper;
import org.apache.bookkeeper.client.LedgerEntry;
import org.apache.bookkeeper.client.LedgerHandle;
import org.apache.bookkeeper.client.BookKeeper.DigestType;
import org.apache.zookeeper.KeeperException;

public class BookTest {

public static void main (String ... args) throws IOException, 
InterruptedException, KeeperException, BKException{
BookKeeper bk = new BookKeeper(127.0.0.1);
LedgerHandle lh = bk.createLedger(DigestType.CRC32, 123.getBytes());
long lh_id = lh.getId();
lh.addEntry(Teste.getBytes());
lh.addEntry(Test2.getBytes());
System.out.printf(Got %d entries for lh\n,lh.getLastAddConfirmed()+1);



lh.addEntry(Test3.getBytes());
LedgerHandle lh1 = bk.openLedger(lh_id, DigestType.CRC32, 
123.getBytes());
System.out.printf(Got %d entries for 
lh1\n,lh1.getLastAddConfirmed()+1);
lh.addEntry(Test4.getBytes());

lh.addEntry(Test5.getBytes());
lh.addEntry(Test6.getBytes());
System.out.printf(Got %d entries for lh\n,lh.getLastAddConfirmed()+1);
EnumerationLedgerEntry seq = lh.readEntries(0, 
lh.getLastAddConfirmed());
while (seq.hasMoreElements()){
System.out.println(new String(seq.nextElement().getEntry()));
}
lh.close();


lh1.addEntry(Test7.getBytes());
lh1.addEntry(Test8.getBytes());

System.out.printf(Got %d entries for 
lh1\n,lh1.getLastAddConfirmed()+1);

seq = lh1.readEntries(0, lh1.getLastAddConfirmed());
while (seq.hasMoreElements()){
System.out.println(new String(seq.nextElement().getEntry()));
}


lh1.close();

LedgerHandle lh2 = bk.openLedger(lh_id, DigestType.CRC32, 
123.getBytes());
lh2.addEntry(Test9.getBytes());

System.out.printf(Got %d entries for lh2 
\n,lh2.getLastAddConfirmed()+1);

seq = lh2.readEntries(0, lh2.getLastAddConfirmed());
while (seq.hasMoreElements()){
System.out.println(new String(seq.nextElement().getEntry()));
}

bk.halt();

}
}

{code}


{panel:title=Output}
Got 2 entries for lh
Got 3 entries for lh1
Got 6 entries for lh
Teste
Test2
Test3
Test4
Test5
Test6
Got 3 entries for lh1
Teste
Test2
Test3
Got 3 entries for lh2 
Teste
Test2
Test3
{panel}

  was:
Calling addEntry on LedgerHandler obtained using openLedger does not throw and 
exception.

Teste Case :

{code}
package br.unicamp.zooexp.booexp;

import java.io.IOException;
import java.util.Enumeration;

import org.apache.bookkeeper.client.BKException;
import org.apache.bookkeeper.client.BookKeeper;
import org.apache.bookkeeper.client.LedgerEntry;
import org.apache.bookkeeper.client.LedgerHandle;
import org.apache.bookkeeper.client.BookKeeper.DigestType;
import org.apache.zookeeper.KeeperException;

public class BookTest {

public static void main (String ... args) throws IOException, 
InterruptedException, KeeperException, BKException{
BookKeeper bk = new BookKeeper(127.0.0.1);
LedgerHandle lh = bk.createLedger(DigestType.CRC32, 123.getBytes());
long lh_id = lh.getId();
lh.addEntry(Teste.getBytes());
lh.addEntry(Test2.getBytes());
System.out.printf(Got %d entries for lh\n,lh.getLastAddConfirmed()+1);



lh.addEntry(Test3.getBytes());
LedgerHandle lh1 = bk.openLedger(lh_id, DigestType.CRC32, 
123.getBytes());
System.out.printf(Got %d entries for 
lh1\n,lh1.getLastAddConfirmed()+1);
lh.addEntry(Test4.getBytes());

lh.addEntry(Test5.getBytes());
lh.addEntry(Test6.getBytes());
System.out.printf(Got %d entries for lh\n,lh.getLastAddConfirmed()+1);
EnumerationLedgerEntry seq = lh.readEntries(0, 
lh.getLastAddConfirmed());
while (seq.hasMoreElements()){
System.out.println(new String(seq.nextElement().getEntry()));
}
lh.close();


lh1.addEntry(Test7.getBytes());
lh1.addEntry(Test8.getBytes());

System.out.printf(Got %d entries for 
lh1\n,lh1.getLastAddConfirmed()+1);

seq = lh1.readEntries(0, lh1.getLastAddConfirmed());
while (seq.hasMoreElements()){
System.out.println(new String(seq.nextElement().getEntry()));
  

[jira] Created: (ZOOKEEPER-825) Opening a ledger for reading causes an implicit close of ledger that is not deteced by write

2010-07-19 Thread JIRA
Opening a ledger for reading causes an implicit close of ledger that is not 
deteced by write 
-

 Key: ZOOKEEPER-825
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-825
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Affects Versions: 3.3.1
Reporter: André Oriani


Quoting the reply of Benjamin Reed to one of my emails
{quote}
there is one other bug you are seeing, before a ledger can be read, it must be 
closed. as your code shows, a process can open a ledger for reading while it is 
still being written to, which causes an implicit close that is not detected by 
the writer.
{quote}


Teste Case :

{code}
package br.unicamp.zooexp.booexp;

import java.io.IOException;
import java.util.Enumeration;

import org.apache.bookkeeper.client.BKException;
import org.apache.bookkeeper.client.BookKeeper;
import org.apache.bookkeeper.client.LedgerEntry;
import org.apache.bookkeeper.client.LedgerHandle;
import org.apache.bookkeeper.client.BookKeeper.DigestType;
import org.apache.zookeeper.KeeperException;

public class BookTest {

public static void main (String ... args) throws IOException, 
InterruptedException, KeeperException, BKException{
BookKeeper bk = new BookKeeper(127.0.0.1);
LedgerHandle lh = bk.createLedger(DigestType.CRC32, 123.getBytes());
long lh_id = lh.getId();
lh.addEntry(Teste.getBytes());
lh.addEntry(Test2.getBytes());
System.out.printf(Got %d entries for lh\n,lh.getLastAddConfirmed()+1);



lh.addEntry(Test3.getBytes());
LedgerHandle lh1 = bk.openLedger(lh_id, DigestType.CRC32, 
123.getBytes());
System.out.printf(Got %d entries for 
lh1\n,lh1.getLastAddConfirmed()+1);
lh.addEntry(Test4.getBytes());

lh.addEntry(Test5.getBytes());
lh.addEntry(Test6.getBytes());
System.out.printf(Got %d entries for lh\n,lh.getLastAddConfirmed()+1);
EnumerationLedgerEntry seq = lh.readEntries(0, 
lh.getLastAddConfirmed());
while (seq.hasMoreElements()){
System.out.println(new String(seq.nextElement().getEntry()));
}
lh.close();


lh1.addEntry(Test7.getBytes());
lh1.addEntry(Test8.getBytes());

System.out.printf(Got %d entries for 
lh1\n,lh1.getLastAddConfirmed()+1);

seq = lh1.readEntries(0, lh1.getLastAddConfirmed());
while (seq.hasMoreElements()){
System.out.println(new String(seq.nextElement().getEntry()));
}


lh1.close();

LedgerHandle lh2 = bk.openLedger(lh_id, DigestType.CRC32, 
123.getBytes());
lh2.addEntry(Test9.getBytes());

System.out.printf(Got %d entries for lh2 
\n,lh2.getLastAddConfirmed()+1);

seq = lh2.readEntries(0, lh2.getLastAddConfirmed());
while (seq.hasMoreElements()){
System.out.println(new String(seq.nextElement().getEntry()));
}

bk.halt();

}
}

{code}


{panel:title=Output}
Got 2 entries for lh
Got 3 entries for lh1
Got 6 entries for lh
Teste
Test2
Test3
Test4
Test5
Test6
Got 3 entries for lh1
Teste
Test2
Test3
Got 3 entries for lh2 
Teste
Test2
Test3
{panel}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-824) Calling addEntry on a LedgerHandler obtained using openLedger does not throw an exception.

2010-07-19 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

André Oriani updated ZOOKEEPER-824:
---

Component/s: contrib-bookkeeper

 Calling addEntry on a LedgerHandler obtained using openLedger does not throw 
 an exception.
 --

 Key: ZOOKEEPER-824
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-824
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Affects Versions: 3.3.1
Reporter: André Oriani

 Calling addEntry on a LedgerHandler obtained using openLedger does not throw 
 an exception.
 Teste Case :
 {code}
 package br.unicamp.zooexp.booexp;
 import java.io.IOException;
 import java.util.Enumeration;
 import org.apache.bookkeeper.client.BKException;
 import org.apache.bookkeeper.client.BookKeeper;
 import org.apache.bookkeeper.client.LedgerEntry;
 import org.apache.bookkeeper.client.LedgerHandle;
 import org.apache.bookkeeper.client.BookKeeper.DigestType;
 import org.apache.zookeeper.KeeperException;
 public class BookTest {
 public static void main (String ... args) throws IOException, 
 InterruptedException, KeeperException, BKException{
 BookKeeper bk = new BookKeeper(127.0.0.1);
 LedgerHandle lh = bk.createLedger(DigestType.CRC32, 123.getBytes());
 long lh_id = lh.getId();
 lh.addEntry(Teste.getBytes());
 lh.addEntry(Test2.getBytes());
 System.out.printf(Got %d entries for 
 lh\n,lh.getLastAddConfirmed()+1);
 lh.addEntry(Test3.getBytes());
 LedgerHandle lh1 = bk.openLedger(lh_id, DigestType.CRC32, 
 123.getBytes());
 System.out.printf(Got %d entries for 
 lh1\n,lh1.getLastAddConfirmed()+1);
 lh.addEntry(Test4.getBytes());
 lh.addEntry(Test5.getBytes());
 lh.addEntry(Test6.getBytes());
 System.out.printf(Got %d entries for 
 lh\n,lh.getLastAddConfirmed()+1);
 EnumerationLedgerEntry seq = lh.readEntries(0, 
 lh.getLastAddConfirmed());
 while (seq.hasMoreElements()){
 System.out.println(new String(seq.nextElement().getEntry()));
 }
 lh.close();
 lh1.addEntry(Test7.getBytes());
 lh1.addEntry(Test8.getBytes());
 System.out.printf(Got %d entries for 
 lh1\n,lh1.getLastAddConfirmed()+1);
 seq = lh1.readEntries(0, lh1.getLastAddConfirmed());
 while (seq.hasMoreElements()){
 System.out.println(new String(seq.nextElement().getEntry()));
 }
 lh1.close();
 LedgerHandle lh2 = bk.openLedger(lh_id, DigestType.CRC32, 
 123.getBytes());
 lh2.addEntry(Test9.getBytes());
 System.out.printf(Got %d entries for lh2 
 \n,lh2.getLastAddConfirmed()+1);
 seq = lh2.readEntries(0, lh2.getLastAddConfirmed());
 while (seq.hasMoreElements()){
 System.out.println(new String(seq.nextElement().getEntry()));
 }
 bk.halt();
 }
 }
 {code}
 {panel:title=Output}
 Got 2 entries for lh
 Got 3 entries for lh1
 Got 6 entries for lh
 Teste
 Test2
 Test3
 Test4
 Test5
 Test6
 Got 3 entries for lh1
 Teste
 Test2
 Test3
 Got 3 entries for lh2 
 Teste
 Test2
 Test3
 {panel}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.