[jira] Commented: (ZOOKEEPER-921) zkPython incorrectly checks for existence of required ACL elements

2010-11-09 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930008#action_12930008
 ] 

Henry Robinson commented on ZOOKEEPER-921:
--

Nicholas - 

Good catch, thanks! Do you think you will be able to submit a patch fixing the 
args checking in check_is_acl()?

Thanks,
Henry

 zkPython incorrectly checks for existence of required ACL elements
 --

 Key: ZOOKEEPER-921
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-921
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.1, 3.4.0
 Environment: Mac OS X 10.6.4, included Python 2.6.1
Reporter: Nicholas Knight
Assignee: Nicholas Knight
 Fix For: 3.3.3, 3.4.0

 Attachments: zktest.py


 Calling {{zookeeper.create()}} seems, under certain circumstances, to be 
 corrupting a subsequent call to Python's {{logging}} module.
 Specifically, if the node does not exist (but its parent does), I end up with 
 a traceback like this when I try to make the logging call:
 {noformat}
 Traceback (most recent call last):
   File zktest.py, line 21, in module
 logger.error(Boom?)
   File 
 /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py,
  line 1046, in error
 if self.isEnabledFor(ERROR):
   File 
 /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py,
  line 1206, in isEnabledFor
 return level = self.getEffectiveLevel()
   File 
 /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py,
  line 1194, in getEffectiveLevel
 while logger:
 TypeError: an integer is required
 {noformat}
 But if the node already exists, or the parent does not exist, I get the 
 appropriate NodeExists or NoNode exceptions.
 I'll be attaching a test script that can be used to reproduce this behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-851) ZK lets any node to become an observer

2010-10-28 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925927#action_12925927
 ] 

Henry Robinson commented on ZOOKEEPER-851:
--

Hi Vishal - 

Sorry for the slow turnaround on this one. It doesn't surprise me that this is 
the behaviour, although it's slightly unexpected that the node becomes an 
observer, rather than a follower. What evidence do you have for that? (Given 
that Mode: follower - I haven't checked the code in a while, but I would have 
thought it would print Mode: Observer).

Henry

 ZK lets any node to become an observer
 --

 Key: ZOOKEEPER-851
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-851
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.1
Reporter: Vishal K
Priority: Critical
 Fix For: 3.4.0


 I had a 3 node cluster running. The zoo.cfg on each contained 3 entries as 
 show below:
 tickTime=2000
 dataDir=/var/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.0=10.150.27.61:2888:3888
 server.1=10.150.27.62:2888:3888
 server.2=10.150.27.63:2888:3888
 I wanted to add another node to the cluster. In fourth node's zoo.cfg, I 
 created another entry for that node and started zk server. The zoo.cfg on the 
 first 3 nodes was left unchanged. The fourth node was able to join the 
 cluster even though the 3 nodes had no idea about the fourth node.
 zoo.cfg on fourth node:
 tickTime=2000
 dataDir=/var/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.0=10.150.27.61:2888:3888
 server.1=10.150.27.62:2888:3888
 server.2=10.150.27.63:2888:3888
 server.3=10.17.117.71:2888:3888
 It looks like 10.17.117.71 is becoming an observer in this case. I was 
 expecting that the leader will reject 10.17.117.71.
 # telnet 10.17.117.71 2181
 Trying 10.17.117.71...
 Connected to 10.17.117.71.
 Escape character is '^]'.
 stat
 Zookeeper version: 3.3.0--1, built on 04/02/2010 22:40 GMT
 Clients:
  /10.17.117.71:37297[1](queued=0,recved=1,sent=0)
 Latency min/avg/max: 0/0/0
 Received: 3
 Sent: 2
 Outstanding: 0
 Zxid: 0x20065
 Mode: follower
 Node count: 288

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-851) ZK lets any node to become an observer

2010-10-28 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926016#action_12926016
 ] 

Henry Robinson commented on ZOOKEEPER-851:
--

I think what happens is that the leader happily lets the new follower connect, 
but that it won't be part of any voting procedure. It shouldn't become leader 
because no other nodes know about it to  propose or support a vote for it. 

To add a new node, you'll need to incrementally restart every node in your 
cluster with the new config.

 ZK lets any node to become an observer
 --

 Key: ZOOKEEPER-851
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-851
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.1
Reporter: Vishal K
Priority: Critical
 Fix For: 3.4.0


 I had a 3 node cluster running. The zoo.cfg on each contained 3 entries as 
 show below:
 tickTime=2000
 dataDir=/var/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.0=10.150.27.61:2888:3888
 server.1=10.150.27.62:2888:3888
 server.2=10.150.27.63:2888:3888
 I wanted to add another node to the cluster. In fourth node's zoo.cfg, I 
 created another entry for that node and started zk server. The zoo.cfg on the 
 first 3 nodes was left unchanged. The fourth node was able to join the 
 cluster even though the 3 nodes had no idea about the fourth node.
 zoo.cfg on fourth node:
 tickTime=2000
 dataDir=/var/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.0=10.150.27.61:2888:3888
 server.1=10.150.27.62:2888:3888
 server.2=10.150.27.63:2888:3888
 server.3=10.17.117.71:2888:3888
 It looks like 10.17.117.71 is becoming an observer in this case. I was 
 expecting that the leader will reject 10.17.117.71.
 # telnet 10.17.117.71 2181
 Trying 10.17.117.71...
 Connected to 10.17.117.71.
 Escape character is '^]'.
 stat
 Zookeeper version: 3.3.0--1, built on 04/02/2010 22:40 GMT
 Clients:
  /10.17.117.71:37297[1](queued=0,recved=1,sent=0)
 Latency min/avg/max: 0/0/0
 Received: 3
 Sent: 2
 Outstanding: 0
 Zxid: 0x20065
 Mode: follower
 Node count: 288

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Apache now has reviewboard

2010-10-26 Thread Henry Robinson
Yes!

On 25 October 2010 22:47, Patrick Hunt ph...@apache.org wrote:

 And we're on it: https://reviews.apache.org/groups/zookeeper/

 https://reviews.apache.org/groups/zookeeper/We should rework our
 howtocommit to incorporate this.

 Patrick

 On Mon, Oct 25, 2010 at 10:16 PM, Patrick Hunt ph...@apache.org wrote:

  FYI:
  https://blogs.apache.org/infra/entry/reviewboard_instance_running_at_the
 
  We should start using this, I've used it for other projects and it worked
  out quite well.
 
  Patrick
 




-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


Re: [VOTE] ZooKeeper as TLP?

2010-10-22 Thread Henry Robinson
+1

On 22 October 2010 14:53, Mahadev Konar maha...@yahoo-inc.com wrote:

 +1

 On 10/22/10 2:42 PM, Patrick Hunt ph...@apache.org wrote:

  Please vote as to whether you think ZooKeeper should become a
  top-level Apache project, as discussed previously on this list. I've
  included below a draft board resolution.
 
  Do folks support sending this request on to the Hadoop PMC?
 
  Patrick
 
  
 
  X. Establish the Apache ZooKeeper Project
 
 WHEREAS, the Board of Directors deems it to be in the best
 interests of the Foundation and consistent with the
 Foundation's purpose to establish a Project Management
 Committee charged with the creation and maintenance of
 open-source software related to distributed system coordination
 for distribution at no charge to the public.
 
 NOW, THEREFORE, BE IT RESOLVED, that a Project Management
 Committee (PMC), to be known as the Apache ZooKeeper Project,
 be and hereby is established pursuant to Bylaws of the
 Foundation; and be it further
 
 RESOLVED, that the Apache ZooKeeper Project be and hereby is
 responsible for the creation and maintenance of software
 related to distributed system coordination; and be it further
 
 RESOLVED, that the office of Vice President, Apache ZooKeeper be
 and hereby is created, the person holding such office to
 serve at the direction of the Board of Directors as the chair
 of the Apache ZooKeeper Project, and to have primary
 responsibility
 for management of the projects within the scope of
 responsibility of the Apache ZooKeeper Project; and be it further
 
 RESOLVED, that the persons listed immediately below be and
 hereby are appointed to serve as the initial members of the
 Apache ZooKeeper Project:
 
   * Patrick Hunt ph...@apache.org
   * Flavio Junqueira f...@apache.org
   * Mahadev Konarmaha...@apache.org
   * Benjamin Reedbr...@apache.org
   * Henry Robinson   he...@apache.org
 
 NOW, THEREFORE, BE IT FURTHER RESOLVED, that Patrick Hunt
 be appointed to the office of Vice President, Apache ZooKeeper, to
 serve in accordance with and subject to the direction of the
 Board of Directors and the Bylaws of the Foundation until
 death, resignation, retirement, removal or disqualification,
 or until a successor is appointed; and be it further
 
 RESOLVED, that the initial Apache ZooKeeper PMC be and hereby is
 tasked with the creation of a set of bylaws intended to
 encourage open development and increased participation in the
 Apache ZooKeeper Project; and be it further
 
 RESOLVED, that the Apache ZooKeeper Project be and hereby
 is tasked with the migration and rationalization of the Apache
 Hadoop ZooKeeper sub-project; and be it further
 
 RESOLVED, that all responsibilities pertaining to the Apache
 Hadoop ZooKeeper sub-project encumbered upon the
 Apache Hadoop Project are hereafter discharged.
 




-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


Re: Restarting discussion on ZooKeeper as a TLP

2010-10-21 Thread Henry Robinson
 was that by becoming a TLP the project would lose it's

 connection with Hadoop, a big source of new users for us. I've been

 assured

 (and you can see with the other projects that have moved to tlp status;

 pig/hive/hbase/etc...) that this connection will be maintained. The

 Hadoop

 ZooKeeper tab for example will redirect to our new homepage.


 Other Apache members also pointed out to me that we are essentially

 operating as a TLP within the Hadoop PMC. Most of the other PMC members

 have

 little or no experience with ZooKeeper and this makes it difficult for

 them

 to monitor and advise us. By moving to TLP status we'll be able to govern

 ourselves and better set our direction.


 I believe we are ready to become a TLP. Please respond to this email with

 your thoughts and any issues. I will call a vote in a few days, once

 discussion settles.


 Regards,


 Patrick




   *flavio*
 *junqueira*

 research scientist

 f...@yahoo-inc.com
 direct +34 93-183-8828

 avinguda diagonal 177, 8th floor, barcelona, 08018, es
 phone (408) 349 3300fax (408) 349 3301






-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


Re: Restarting discussion on ZooKeeper as a TLP

2010-10-21 Thread Henry Robinson
Ha, I may just have excluded myself from eligibility due to my inability to
read :)

On 21 October 2010 13:28, Patrick Hunt ph...@apache.org wrote:

 Ack, I missed Henry in the list, sorry! In my defense I copied this:
 http://hadoop.apache.org/zookeeper/credits.html

 one more try (same as before except for adding henry to the pmc):
 

X. Establish the Apache ZooKeeper Project

   WHEREAS, the Board of Directors deems it to be in the best
   interests of the Foundation and consistent with the
   Foundation's purpose to establish a Project Management
   Committee charged with the creation and maintenance of
   open-source software related to data serialization
   for distribution at no charge to the public.

   NOW, THEREFORE, BE IT RESOLVED, that a Project Management
   Committee (PMC), to be known as the Apache ZooKeeper Project,
   be and hereby is established pursuant to Bylaws of the
   Foundation; and be it further

   RESOLVED, that the Apache ZooKeeper Project be and hereby is
   responsible for the creation and maintenance of software
   related to data serialization; and be it further

   RESOLVED, that the office of Vice President, Apache ZooKeeper be
   and hereby is created, the person holding such office to
   serve at the direction of the Board of Directors as the chair
   of the Apache ZooKeeper Project, and to have primary responsibility
   for management of the projects within the scope of
   responsibility of the Apache ZooKeeper Project; and be it further

   RESOLVED, that the persons listed immediately below be and
   hereby are appointed to serve as the initial members of the
   Apache ZooKeeper Project:

 * Patrick Hunt ph...@apache.org
 * Flavio Junqueira f...@apache.org
 * Mahadev Konarmaha...@apache.org
 * Benjamin Reedbr...@apache.org
  * Henry Robinson   he...@apache.org

   NOW, THEREFORE, BE IT FURTHER RESOLVED, that Matt Massie
be appointed to the office of Vice President, Apache ZooKeeper, to
   serve in accordance with and subject to the direction of the
   Board of Directors and the Bylaws of the Foundation until
   death, resignation, retirement, removal or disqualification,
   or until a successor is appointed; and be it further

   RESOLVED, that the initial Apache ZooKeeper PMC be and hereby is
   tasked with the creation of a set of bylaws intended to
   encourage open development and increased participation in the
   Apache ZooKeeper Project; and be it further

   RESOLVED, that the Apache ZooKeeper Project be and hereby
   is tasked with the migration and rationalization of the Apache
   Hadoop ZooKeeper sub-project; and be it further

   RESOLVED, that all responsibilities pertaining to the Apache
   Hadoop ZooKeeper sub-project encumbered upon the
   Apache Hadoop Project are hereafter discharged.

 On Thu, Oct 21, 2010 at 10:44 AM, Henry Robinson he...@cloudera.com
 wrote:

  Looks good, please do call a vote.
 
  On 21 October 2010 09:29, Patrick Hunt ph...@apache.org wrote:
 
   Here's a draft board resolution (not a vote, just discussion). It lists
  all
   current committers (except as noted in the next paragraph) as the
 initial
   members of the project management committee (PMC) and myself as the
  initial
   chair.
  
   Notice that I have left Andrew off the PMC as he has not been active
 with
   the project for over two years. I believe we should continue to include
  him
   on the committer roles subsequent to moving to tlp, however as he has
 not
   been an active member of the community for such a long period we would
  not
   include him on the PMC at this time. If others feel differently let me
  know,
   I'm willing to include him if the people feel differently.
  
   LMK if this looks good to you and I'll call for an official vote on
 this
   list (then we'll be ready to call a vote on the hadoop pmc).
  
   Regards,
  
   Patrick
  
   
  
   X. Establish the Apache ZooKeeper Project
  
  WHEREAS, the Board of Directors deems it to be in the best
  interests of the Foundation and consistent with the
  Foundation's purpose to establish a Project Management
  Committee charged with the creation and maintenance of
  open-source software related to data serialization
  for distribution at no charge to the public.
  
  NOW, THEREFORE, BE IT RESOLVED, that a Project Management
  Committee (PMC), to be known as the Apache ZooKeeper Project,
  be and hereby is established pursuant to Bylaws of the
  Foundation; and be it further
  
  RESOLVED, that the Apache ZooKeeper Project be and hereby is
  responsible for the creation and maintenance of software
  related to data serialization

Re: Restarting discussion on ZooKeeper as a TLP

2010-10-20 Thread Henry Robinson
+1, thanks for following through with the protocol.

On 20 October 2010 11:02, Vishal K vishalm...@gmail.com wrote:

 +1.

 On Wed, Oct 20, 2010 at 1:50 PM, Patrick Hunt ph...@apache.org wrote:

  It's been a few days, any thoughts? Acceptable? I'd like to keep moving
 the
  ball forward. Thanks.
 
  Patrick
 
  On Sun, Oct 17, 2010 at 8:43 PM, 明珠刘 redis...@gmail.com wrote:
 
   +1
  
   2010/10/14 Patrick Hunt ph...@apache.org
  
In March of this year we discussed a request from the Apache Board,
 and
Hadoop PMC, that we become a TLP rather than a subproject of Hadoop:
   
Original discussion
http://markmail.org/thread/42cobkpzlgotcbin
   
I originally voted against this move, my primary concern being that
 we
   were
not ready to move to tlp status given our small contributor base
 and
limited contributor diversity. However I'd now like to revisit that
discussion/decision. Since that time the team has been working hard
 to
attract new contributors, and we've seen significant new
 contributions
   come
in. There has also been feedback from board/pmc addressing many of
  these
concerns (both on the list and in private). I am now less concerned
  about
this issue and don't see it as a blocker for us to move to TLP
 status.
   
A second concern was that by becoming a TLP the project would lose
 it's
connection with Hadoop, a big source of new users for us. I've been
   assured
(and you can see with the other projects that have moved to tlp
 status;
pig/hive/hbase/etc...) that this connection will be maintained. The
   Hadoop
ZooKeeper tab for example will redirect to our new homepage.
   
Other Apache members also pointed out to me that we are essentially
operating as a TLP within the Hadoop PMC. Most of the other PMC
 members
have
little or no experience with ZooKeeper and this makes it difficult
 for
   them
to monitor and advise us. By moving to TLP status we'll be able to
  govern
ourselves and better set our direction.
   
I believe we are ready to become a TLP. Please respond to this email
  with
your thoughts and any issues. I will call a vote in a few days, once
discussion settles.
   
Regards,
   
Patrick
   
  
 




-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher

2010-10-19 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-888:
-

Hadoop Flags: [Reviewed]

I just committed this to origin/branch-3.3 and origin/trunk. 

Thanks both!

 c-client / zkpython: Double free corruption on node watcher
 ---

 Key: ZOOKEEPER-888
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, contrib-bindings
Affects Versions: 3.3.1
Reporter: Lukas
Assignee: Lukas
Priority: Critical
 Fix For: 3.3.2, 3.4.0

 Attachments: resume-segfault.py, ZOOKEEPER-888-3.3.patch, 
 ZOOKEEPER-888.patch


 the c-client / zkpython wrapper invokes already freed watcher callback
 steps to reproduce:
   0. start a zookeper server on your machine
   1. run the attached python script
   2. suspend the zookeeper server process (e.g. using `pkill -STOP -f 
 org.apache.zookeeper.server.quorum.QuorumPeerMain` )
   3. wait until the connection and the node observer fired with a session 
 event
   4. resume the zookeeper server process  (e.g. using `pkill -CONT -f 
 org.apache.zookeeper.server.quorum.QuorumPeerMain` )
 - the client tries to dispatch the node observer function again, but it was 
 already freed - double free corruption

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher

2010-10-19 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-888:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 c-client / zkpython: Double free corruption on node watcher
 ---

 Key: ZOOKEEPER-888
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, contrib-bindings
Affects Versions: 3.3.1
Reporter: Lukas
Assignee: Lukas
Priority: Critical
 Fix For: 3.3.2, 3.4.0

 Attachments: resume-segfault.py, ZOOKEEPER-888-3.3.patch, 
 ZOOKEEPER-888.patch


 the c-client / zkpython wrapper invokes already freed watcher callback
 steps to reproduce:
   0. start a zookeper server on your machine
   1. run the attached python script
   2. suspend the zookeeper server process (e.g. using `pkill -STOP -f 
 org.apache.zookeeper.server.quorum.QuorumPeerMain` )
   3. wait until the connection and the node observer fired with a session 
 event
   4. resume the zookeeper server process  (e.g. using `pkill -CONT -f 
 org.apache.zookeeper.server.quorum.QuorumPeerMain` )
 - the client tries to dispatch the node observer function again, but it was 
 already freed - double free corruption

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher

2010-10-18 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922209#action_12922209
 ] 

Henry Robinson commented on ZOOKEEPER-888:
--

The patch as it stands relies on ZOOKEEPER-853 (which it fixes) which is not in 
3.3 as it is a small API change - it changes is_unrecoverable to return Python 
True or False, rather than ZINVALIDSTATE. 

So I'm not certain about what to do here - we try not to change APIs between 
minor versions. However, this is a very minor change, and this patch fixes a 
significant bug. I'm inclined to commit both 853 and this patch to 3.3 as well 
as trunk, and put a note in the release notes. 

Any objections?

 c-client / zkpython: Double free corruption on node watcher
 ---

 Key: ZOOKEEPER-888
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, contrib-bindings
Affects Versions: 3.3.1
Reporter: Lukas
Assignee: Lukas
Priority: Critical
 Fix For: 3.3.2, 3.4.0

 Attachments: resume-segfault.py, ZOOKEEPER-888.patch


 the c-client / zkpython wrapper invokes already freed watcher callback
 steps to reproduce:
   0. start a zookeper server on your machine
   1. run the attached python script
   2. suspend the zookeeper server process (e.g. using `pkill -STOP -f 
 org.apache.zookeeper.server.quorum.QuorumPeerMain` )
   3. wait until the connection and the node observer fired with a session 
 event
   4. resume the zookeeper server process  (e.g. using `pkill -CONT -f 
 org.apache.zookeeper.server.quorum.QuorumPeerMain` )
 - the client tries to dispatch the node observer function again, but it was 
 already freed - double free corruption

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Running a single unit test

2010-10-17 Thread Henry Robinson
You need to use -Dtestcase, not -Dtest, as per below:

ant test -Dtestcase=YourTestHere

HTH,

Henry

On 17 October 2010 17:34, Michi Mutsuzaki mic...@yahoo-inc.com wrote:

 Hello,

 How do I run a single unit test? I tried this:

 $ ant test -Dtest=SessionTest

 but it still runs all the tests.

 Thanks!
 --Michi




-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


Re: What's the QA strategy of ZooKeeper?

2010-10-15 Thread Henry Robinson
I broadly agree with Ben - all meaningful code changes carry a risk of
destabilization (otherwise software development would be very easy) so we
should guard against improving cleanliness only for its own sake. At the
point where bad code gets in the way of fixing bugs or adding features, I
think it's very worthwhile to 'lazily' clean code.

I did this with the observers patch - reworked some of the class hierarchies
to improve encapsulation and make it easier to add new implementations.

The netty patch is a good test case for this approach. If we feel that
reworking the structure of the existing server cnxn code will make it
significantly easier to add a second implementation that adheres to the same
interface, then I say that such a refactoring is worthwhile, but even then
only if it's straightforward to make the changes while convincing ourselves
that the behaviour of the new implementation is consistent with the old.

Thomas, do comment on the patch itself! That's the very best way to make
sure your concerns get heard and addressed.

cheers,
Henry

On 15 October 2010 11:37, Benjamin Reed br...@yahoo-inc.com wrote:

  i think we have a very different perspective on the quality issue:



  I didn't want to say it that clear, but especially the new Netty code,
 both on client and server side is IMHO an example of new code in very bad
 shape. The
 client code patch even changes the FindBugs configuration to exclude the
 new
 code from the FindBugs checks.

  great. fixing the code and refactoring before a patch goes in is the
 perfect time to do it! please give feedback and help make the patch better.
 there is a reason to exclude checks (which is why there is such excludes),
 but if we can avoid them we should. before a patch is applied is exactly the
 time to do cleanup

  If your code is already in such a bad shape, that every change includes
 considerable risk to break something, then you already are in trouble.
 With
 every new feature (or bugfix!) you also risk to break something.
 If you don't have the attitude of permanent refactoring to improve the
 code
 quality, you will inevitably lower the maintainability of your code with
 every
 new feature. New features will build on the dirty concepts already in the
 code
 and therfor make it more expensive to ever clean things up.

 cleaning up code to add a new feature is a great time to clean up the code.

  Yes. Refactoring isn't easy, but necessary. Only over time you better
 understand your domain and find better structures. Over time you introduce
 features that let code grow so that it should better be split up in
 smaller
 units that the human brain can still handle.

  it is the but necessary that i disagree with. there is plenty of code
 that could be cleaned up and made to look a lot nicer, but we shouldn't
 touch it, unless we are fixing something else or adding a new feature. it's
 pretty lame to explain to someone that the bug that was introduced by a code
 change was motivated by a desire to make the code cleaner. any code change
 runs the risk of breakage, thus changing code simply for cleanliness is not
 worth the risk.

 ben




-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


Re: Restarting discussion on ZooKeeper as a TLP

2010-10-14 Thread Henry Robinson
+1,

I agree that we've addressed most outstanding concerns, we're ready for
TLP.

Henry

On 14 October 2010 13:29, Mahadev Konar maha...@yahoo-inc.com wrote:

 +1 for moving to TLP.

 Thanks for starting the vote Pat.

 mahadev


 On 10/13/10 2:10 PM, Patrick Hunt ph...@apache.org wrote:

  In March of this year we discussed a request from the Apache Board, and
  Hadoop PMC, that we become a TLP rather than a subproject of Hadoop:
 
  Original discussion
  http://markmail.org/thread/42cobkpzlgotcbin
 
  I originally voted against this move, my primary concern being that we
 were
  not ready to move to tlp status given our small contributor base and
  limited contributor diversity. However I'd now like to revisit that
  discussion/decision. Since that time the team has been working hard to
  attract new contributors, and we've seen significant new contributions
 come
  in. There has also been feedback from board/pmc addressing many of these
  concerns (both on the list and in private). I am now less concerned about
  this issue and don't see it as a blocker for us to move to TLP status.
 
  A second concern was that by becoming a TLP the project would lose it's
  connection with Hadoop, a big source of new users for us. I've been
 assured
  (and you can see with the other projects that have moved to tlp status;
  pig/hive/hbase/etc...) that this connection will be maintained. The
 Hadoop
  ZooKeeper tab for example will redirect to our new homepage.
 
  Other Apache members also pointed out to me that we are essentially
  operating as a TLP within the Hadoop PMC. Most of the other PMC members
 have
  little or no experience with ZooKeeper and this makes it difficult for
 them
  to monitor and advise us. By moving to TLP status we'll be able to govern
  ourselves and better set our direction.
 
  I believe we are ready to become a TLP. Please respond to this email with
  your thoughts and any issues. I will call a vote in a few days, once
  discussion settles.
 
  Regards,
 
  Patrick
 




-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


[jira] Commented: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests

2010-10-14 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921103#action_12921103
 ] 

Henry Robinson commented on ZOOKEEPER-893:
--

Thanks for the patch Thijs! It looks pretty good to me - good catch.

Do you think you might be able to write a test case that verifies correct 
behaviour when you send malformed messages to the control port? 

 ZooKeeper high cpu usage when invalid requests
 --

 Key: ZOOKEEPER-893
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
 Environment: Linux 2.6.16
 4x Intel(R) Xeon(R) CPU X3320  @ 2.50GHz
 java version 1.6.0_17
 Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
 Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
Reporter: Thijs Terlouw
Assignee: Thijs Terlouw
Priority: Critical
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-893.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 When ZooKeeper receives certain illegally formed messages on the internal 
 communication port (:4181 by default), it's possible for ZooKeeper to enter 
 an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, 
 but that patch does not resolve all issues.
 from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 
 the two affected parts:
 ===
 int length = msgLength.getInt();  
   
 if(length = 0) { 
   
 throw new IOException(Invalid packet length: + length); 
   
 } 
 ===
 ===
 while (message.hasRemaining()) {  
   
 temp_numbytes = channel.read(message);
   
 if(temp_numbytes  0) {   
   
 throw new IOException(Channel eof before end);  
   
 } 
   
 numbytes += temp_numbytes;
   
 } 
 ===
 how to replicate this bug:
 perform an nmap portscan against your zookeeper server: nmap -sV -n 
 your.ip.here -p4181
 wait for a while untill you see some messages in the logfile and then you 
 will see 100% cpu usage. It does not recover from this situation. With my 
 patch, it does not occur anymore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher

2010-10-14 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-888:
-


The patch looks good to me - thanks! 

Could you add a test case that verifies the correct behaviour, if possible? (I 
appreciate it can be hard to fake unrecoverable session errors). We keep 
circling around the correct behaviour for this code block, and I'd like to 
capture it in a test suite.

 c-client / zkpython: Double free corruption on node watcher
 ---

 Key: ZOOKEEPER-888
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, contrib-bindings
Affects Versions: 3.3.1
Reporter: Lukas
Priority: Critical
 Fix For: 3.3.2, 3.4.0

 Attachments: resume-segfault.py, ZOOKEEPER-888.patch


 the c-client / zkpython wrapper invokes already freed watcher callback
 steps to reproduce:
   0. start a zookeper server on your machine
   1. run the attached python script
   2. suspend the zookeeper server process (e.g. using `pkill -STOP -f 
 org.apache.zookeeper.server.quorum.QuorumPeerMain` )
   3. wait until the connection and the node observer fired with a session 
 event
   4. resume the zookeeper server process  (e.g. using `pkill -CONT -f 
 org.apache.zookeeper.server.quorum.QuorumPeerMain` )
 - the client tries to dispatch the node observer function again, but it was 
 already freed - double free corruption

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-785) Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line

2010-09-14 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909376#action_12909376
 ] 

Henry Robinson commented on ZOOKEEPER-785:
--

This patch looks good - a couple of comments:

1. Can you expand the comment  // Not a quorum configuration so return 
immediately to be clear that this isn't a problem, and that the server will 
default to standalone mode?
2. Can you actually move the 'bit out of place' test to somewhere more 
sensible? :) Let's make a QuorumConfigurationTest class if we have to.



  Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line
 ---

 Key: ZOOKEEPER-785
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-785
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
 Environment: Tested in linux with a new jvm
Reporter: Alex Newman
Assignee: Patrick Hunt
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-785.patch, ZOOKEEPER-785.patch, 
 ZOOKEEPER-785_2.patch, ZOOKEEPER-785_2_br33.patch


 The following config causes an infinite loop
 [zoo.cfg]
 tickTime=2000
 dataDir=/var/zookeeper/
 clientPort=2181
 initLimit=10
 syncLimit=5
 server.0=localhost:2888:3888
 Output:
 2010-06-01 16:20:32,471 - INFO [main:quorumpeerm...@119] - Starting quorum 
 peer
 2010-06-01 16:20:32,489 - INFO [main:nioservercnxn$fact...@143] - binding to 
 port 0.0.0.0/0.0.0.0:2181
 2010-06-01 16:20:32,504 - INFO [main:quorump...@818] - tickTime set to 2000
 2010-06-01 16:20:32,504 - INFO [main:quorump...@829] - minSessionTimeout set 
 to -1
 2010-06-01 16:20:32,505 - INFO [main:quorump...@840] - maxSessionTimeout set 
 to -1
 2010-06-01 16:20:32,505 - INFO [main:quorump...@855] - initLimit set to 10
 2010-06-01 16:20:32,526 - INFO [main:files...@82] - Reading snapshot 
 /var/zookeeper/version-2/snapshot.c
 2010-06-01 16:20:32,547 - INFO [Thread-1:quorumcnxmanager$liste...@436] - My 
 election bind port: 3888
 2010-06-01 16:20:32,554 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
 2010-06-01 16:20:32,556 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
 id = 0, Proposed zxid = 12
 2010-06-01 16:20:32,558 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
 12, 1, 0, LOOKING, LOOKING, 0
 2010-06-01 16:20:32,560 - WARN 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
 java.lang.NullPointerException
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709)
 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621)
 2010-06-01 16:20:32,560 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
 2010-06-01 16:20:32,560 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
 id = 0, Proposed zxid = 12
 2010-06-01 16:20:32,561 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
 12, 2, 0, LOOKING, LOOKING, 0
 2010-06-01 16:20:32,561 - WARN 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
 java.lang.NullPointerException
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709)
 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621)
 2010-06-01 16:20:32,561 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
 2010-06-01 16:20:32,562 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
 id = 0, Proposed zxid = 12
 2010-06-01 16:20:32,562 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
 12, 3, 0, LOOKING, LOOKING, 0
 2010-06-01 16:20:32,562 - WARN 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
 java.lang.NullPointerException
 Things like HBase require that the zookeeper servers be listed in the 
 zoo.cfg. This is a bug on their part, but zookeeper shouldn't null pointer in 
 a loop though.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-785) Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line

2010-09-14 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-785:
-

Hadoop Flags: [Reviewed]

  Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line
 ---

 Key: ZOOKEEPER-785
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-785
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
 Environment: Tested in linux with a new jvm
Reporter: Alex Newman
Assignee: Patrick Hunt
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-785.patch, ZOOKEEPER-785.patch, 
 ZOOKEEPER-785_2.patch, ZOOKEEPER-785_2.patch, ZOOKEEPER-785_2_br33.patch, 
 ZOOKEEPER-785_2_br33.patch


 The following config causes an infinite loop
 [zoo.cfg]
 tickTime=2000
 dataDir=/var/zookeeper/
 clientPort=2181
 initLimit=10
 syncLimit=5
 server.0=localhost:2888:3888
 Output:
 2010-06-01 16:20:32,471 - INFO [main:quorumpeerm...@119] - Starting quorum 
 peer
 2010-06-01 16:20:32,489 - INFO [main:nioservercnxn$fact...@143] - binding to 
 port 0.0.0.0/0.0.0.0:2181
 2010-06-01 16:20:32,504 - INFO [main:quorump...@818] - tickTime set to 2000
 2010-06-01 16:20:32,504 - INFO [main:quorump...@829] - minSessionTimeout set 
 to -1
 2010-06-01 16:20:32,505 - INFO [main:quorump...@840] - maxSessionTimeout set 
 to -1
 2010-06-01 16:20:32,505 - INFO [main:quorump...@855] - initLimit set to 10
 2010-06-01 16:20:32,526 - INFO [main:files...@82] - Reading snapshot 
 /var/zookeeper/version-2/snapshot.c
 2010-06-01 16:20:32,547 - INFO [Thread-1:quorumcnxmanager$liste...@436] - My 
 election bind port: 3888
 2010-06-01 16:20:32,554 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
 2010-06-01 16:20:32,556 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
 id = 0, Proposed zxid = 12
 2010-06-01 16:20:32,558 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
 12, 1, 0, LOOKING, LOOKING, 0
 2010-06-01 16:20:32,560 - WARN 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
 java.lang.NullPointerException
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709)
 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621)
 2010-06-01 16:20:32,560 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
 2010-06-01 16:20:32,560 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
 id = 0, Proposed zxid = 12
 2010-06-01 16:20:32,561 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
 12, 2, 0, LOOKING, LOOKING, 0
 2010-06-01 16:20:32,561 - WARN 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
 java.lang.NullPointerException
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709)
 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621)
 2010-06-01 16:20:32,561 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
 2010-06-01 16:20:32,562 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
 id = 0, Proposed zxid = 12
 2010-06-01 16:20:32,562 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
 12, 3, 0, LOOKING, LOOKING, 0
 2010-06-01 16:20:32,562 - WARN 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
 java.lang.NullPointerException
 Things like HBase require that the zookeeper servers be listed in the 
 zoo.cfg. This is a bug on their part, but zookeeper shouldn't null pointer in 
 a loop though.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-785) Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line

2010-09-14 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-785:
-


+1, this looks good (although I'd remove the 'out of place in this class' 
comment now that you've moved it). 

  Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line
 ---

 Key: ZOOKEEPER-785
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-785
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
 Environment: Tested in linux with a new jvm
Reporter: Alex Newman
Assignee: Patrick Hunt
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-785.patch, ZOOKEEPER-785.patch, 
 ZOOKEEPER-785_2.patch, ZOOKEEPER-785_2.patch, ZOOKEEPER-785_2_br33.patch, 
 ZOOKEEPER-785_2_br33.patch


 The following config causes an infinite loop
 [zoo.cfg]
 tickTime=2000
 dataDir=/var/zookeeper/
 clientPort=2181
 initLimit=10
 syncLimit=5
 server.0=localhost:2888:3888
 Output:
 2010-06-01 16:20:32,471 - INFO [main:quorumpeerm...@119] - Starting quorum 
 peer
 2010-06-01 16:20:32,489 - INFO [main:nioservercnxn$fact...@143] - binding to 
 port 0.0.0.0/0.0.0.0:2181
 2010-06-01 16:20:32,504 - INFO [main:quorump...@818] - tickTime set to 2000
 2010-06-01 16:20:32,504 - INFO [main:quorump...@829] - minSessionTimeout set 
 to -1
 2010-06-01 16:20:32,505 - INFO [main:quorump...@840] - maxSessionTimeout set 
 to -1
 2010-06-01 16:20:32,505 - INFO [main:quorump...@855] - initLimit set to 10
 2010-06-01 16:20:32,526 - INFO [main:files...@82] - Reading snapshot 
 /var/zookeeper/version-2/snapshot.c
 2010-06-01 16:20:32,547 - INFO [Thread-1:quorumcnxmanager$liste...@436] - My 
 election bind port: 3888
 2010-06-01 16:20:32,554 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
 2010-06-01 16:20:32,556 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
 id = 0, Proposed zxid = 12
 2010-06-01 16:20:32,558 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
 12, 1, 0, LOOKING, LOOKING, 0
 2010-06-01 16:20:32,560 - WARN 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
 java.lang.NullPointerException
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709)
 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621)
 2010-06-01 16:20:32,560 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
 2010-06-01 16:20:32,560 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
 id = 0, Proposed zxid = 12
 2010-06-01 16:20:32,561 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
 12, 2, 0, LOOKING, LOOKING, 0
 2010-06-01 16:20:32,561 - WARN 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
 java.lang.NullPointerException
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709)
 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621)
 2010-06-01 16:20:32,561 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
 2010-06-01 16:20:32,562 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
 id = 0, Proposed zxid = 12
 2010-06-01 16:20:32,562 - INFO 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
 12, 3, 0, LOOKING, LOOKING, 0
 2010-06-01 16:20:32,562 - WARN 
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
 java.lang.NullPointerException
 Things like HBase require that the zookeeper servers be listed in the 
 zoo.cfg. This is a bug on their part, but zookeeper shouldn't null pointer in 
 a loop though.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Zoosh!

2010-09-01 Thread Henry Robinson
Hi Michi -

This sounds cool - but your link goes to what I think is a Yahoo-internal
site, and I suspect that 'yinst' is a Yahoo-specific tool.

Perhaps you either did not mean to send this mail to this list, or you are
not aware that this is a public mailing list, open to all? Either way,
thanks for your interest in ZooKeeper, and if what you have written would be
of interest to a general audience, please do consider contributing it back!

cheers,
Henry

On 31 August 2010 17:40, Michi Mutsuzaki mic...@yahoo-inc.com wrote:

 I created a wrapper package for Java zookeeper shell. Unlike C version, it
 supports command history and tab completion.

 $ yinst install zoosh -br test
 $ zoosh localhost:2181

 http://dist.corp.yahoo.com/by-package/zoosh/

 --Michi





-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


[jira] Updated: (ZOOKEEPER-853) Make zookeeper.is_unrecoverable return True or False and not an integer

2010-08-30 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-853:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed this (to trunk) - thanks Andrei!

 Make zookeeper.is_unrecoverable return True or False and not an integer
 ---

 Key: ZOOKEEPER-853
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-853
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bindings
Reporter: Andrei Savu
Assignee: Andrei Savu
Priority: Minor
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-853.patch, ZOOKEEPER-853.patch


 This is a patch that fixes a TODO from the python zookeeper extension, it 
 makes {{zookeeper.is_unrecoverable}} return {{True}} or {{False}} and not an 
 integer. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Putting copyright notices in ZK?

2010-08-26 Thread Henry Robinson
Hi Vishal -

I'm afraid we don't allow author or copyright information in source
files. Putting
one's own copyright notice is against Apache policy (and we are guided by
the rules of the ASF). The SVN logs will keep track of ownership details,
but it's not at all clear what copyright notices even mean once you have
granted license to the ASF by virtue of submitting your patch. To avoid any
confusion, we just disallow author specific information in the source.

I hope you can find some compromise with your legal department - I'm pretty
sure I know of other contributions from VMWare employees to open source
projects that don't have this restriction, so I'm hopeful that you can
resolve this issue.

Best,
Henry


On 26 August 2010 14:58, Vishal K vishalm...@gmail.com wrote:

 Hi All,

 I work for VMware. My company tells me that any contirubtion that I make to
 ZK needs to have a line saying Copyright [year of creation - year of last
 modification] VMware, Inc. All Rights Reserved.
 If portions of a file are modified, then I could identify only those
 portions of the file, if needed. No change to license is required.

 Needless to say, I am personally ok to make contirbutions without any such
 notices. What is ZK's policy on this? What would be a good solution in this
 case satisfyigng both the parties (ZK and my company's legal dept.)?
 Thanks.
 -Vishal




-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


[jira] Updated: (ZOOKEEPER-853) Make zookeeper.is_unrecoverable return True or False and not an integer

2010-08-24 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-853:
-

Hadoop Flags: [Reviewed]

+1 This looks good to me - thanks. 

 Make zookeeper.is_unrecoverable return True or False and not an integer
 ---

 Key: ZOOKEEPER-853
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-853
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bindings
Reporter: Andrei Savu
Assignee: Andrei Savu
Priority: Minor
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-853.patch, ZOOKEEPER-853.patch


 This is a patch that fixes a TODO from the python zookeeper extension, it 
 makes {{zookeeper.is_unrecoverable}} return {{True}} or {{False}} and not an 
 integer. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-792) zkpython memory leak

2010-08-22 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-792:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

I just committed this! Thanks Lei Zhang!

 zkpython memory leak
 

 Key: ZOOKEEPER-792
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.1
 Environment: vmware workstation - guest OS:Linux python:2.4.3
Reporter: Lei Zhang
Assignee: Lei Zhang
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-792.patch, ZOOKEEPER-792.patch, 
 ZOOKEEPER-792.patch


 We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less 
 client deadlock on session expiration, which is a definite plus!
 Unfortunately we are seeing memory leak that requires our zk clients to be 
 restarted every half-day. Valgrind result:
 ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in 
 loss record 255 of 670
 ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418)
 ==8804==by 0x5047B42: parse_acls (zookeeper.c:369)
 ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009)
 ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0)
 ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0)
 ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-792) zkpython memory leak

2010-08-19 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900380#action_12900380
 ] 

Henry Robinson commented on ZOOKEEPER-792:
--

Aha - I think I have found the problem, and it was related to this patch.


   PyObject *ret = Py_BuildValue( (s#,N), buffer,buffer_len, stat_dict );
+  free_pywatcher(pw);
   free(buffer);

We shouldn't free the pywatcher_t object here because it may be called later. 
This was what was causing the segfault I was seeing. I'll upload a new patch 
with this line removed; I hope it will still fix your memory consumption 
issues. 

 zkpython memory leak
 

 Key: ZOOKEEPER-792
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.1
 Environment: vmware workstation - guest OS:Linux python:2.4.3
Reporter: Lei Zhang
Assignee: Lei Zhang
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-792.patch


 We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less 
 client deadlock on session expiration, which is a definite plus!
 Unfortunately we are seeing memory leak that requires our zk clients to be 
 restarted every half-day. Valgrind result:
 ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in 
 loss record 255 of 670
 ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418)
 ==8804==by 0x5047B42: parse_acls (zookeeper.c:369)
 ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009)
 ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0)
 ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0)
 ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-792) zkpython memory leak

2010-08-19 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-792:
-

Attachment: ZOOKEEPER-792.patch

I forgot --no-prefix. Plus ca change, plus c'est la meme chose. 

 zkpython memory leak
 

 Key: ZOOKEEPER-792
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.1
 Environment: vmware workstation - guest OS:Linux python:2.4.3
Reporter: Lei Zhang
Assignee: Lei Zhang
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-792.patch, ZOOKEEPER-792.patch, 
 ZOOKEEPER-792.patch


 We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less 
 client deadlock on session expiration, which is a definite plus!
 Unfortunately we are seeing memory leak that requires our zk clients to be 
 restarted every half-day. Valgrind result:
 ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in 
 loss record 255 of 670
 ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418)
 ==8804==by 0x5047B42: parse_acls (zookeeper.c:369)
 ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009)
 ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0)
 ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0)
 ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-792) zkpython memory leak

2010-08-17 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899676#action_12899676
 ] 

Henry Robinson commented on ZOOKEEPER-792:
--

Just to update - I've found that zkpython tests are failing in trunk, and I 
don't want to commit a patch when the tests are broken. I'll be creating a JIRA 
shortly to address the problem once I've looked into it slightly further.

 zkpython memory leak
 

 Key: ZOOKEEPER-792
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.1
 Environment: vmware workstation - guest OS:Linux python:2.4.3
Reporter: Lei Zhang
Assignee: Lei Zhang
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-792.patch


 We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less 
 client deadlock on session expiration, which is a definite plus!
 Unfortunately we are seeing memory leak that requires our zk clients to be 
 restarted every half-day. Valgrind result:
 ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in 
 loss record 255 of 670
 ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418)
 ==8804==by 0x5047B42: parse_acls (zookeeper.c:369)
 ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009)
 ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0)
 ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0)
 ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-792) zkpython memory leak

2010-08-16 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899004#action_12899004
 ] 

Henry Robinson commented on ZOOKEEPER-792:
--

Hi - 

Sorry for the slow response! I just took a look over the patch - good catches.

+1. I'll commit within the day. 

Henry

 zkpython memory leak
 

 Key: ZOOKEEPER-792
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.1
 Environment: vmware workstation - guest OS:Linux python:2.4.3
Reporter: Lei Zhang
Assignee: Lei Zhang
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-792.patch


 We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less 
 client deadlock on session expiration, which is a definite plus!
 Unfortunately we are seeing memory leak that requires our zk clients to be 
 restarted every half-day. Valgrind result:
 ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in 
 loss record 255 of 670
 ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418)
 ==8804==by 0x5047B42: parse_acls (zookeeper.c:369)
 ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009)
 ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0)
 ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0)
 ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-784) server-side functionality for read-only mode

2010-08-11 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897338#action_12897338
 ] 

Henry Robinson commented on ZOOKEEPER-784:
--

Spectacular job, Sergey. I've taken a look at the code and I'm pretty satisfied 
- you've done a great job covering little things like JMX support, and good 
code comments and documentation. 

I'm going to wait for one of the other committers to come by and also give this 
a +1 since this is a substantial change. We may also decide to run a long-lived 
test with this patch to satisfy ourselves of the stability. But this looks 
very, very solid indeed. 

 server-side functionality for read-only mode
 

 Key: ZOOKEEPER-784
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-784
 Project: Zookeeper
  Issue Type: Sub-task
  Components: server
Reporter: Sergey Doroshenko
Assignee: Sergey Doroshenko
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, 
 ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, 
 ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch


 As per http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode , create 
 ReadOnlyZooKeeperServer which comes into play when peer is partitioned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-19 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889940#action_12889940
 ] 

Henry Robinson commented on ZOOKEEPER-821:
--

Rich - 

This is a really useful contribution, thanks! The only thing I would change 
from your patch would be to use snprintf with a buffer length of 10 so as to 
avoid any potential string overflows if our version numbers ever get huge :)

Otherwise +1; if you make this change I'll commit asap. 

Thanks!
Henry

 Add ZooKeeper version information to zkpython
 -

 Key: ZOOKEEPER-821
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bindings
Affects Versions: 3.3.1
Reporter: Rich Schumacher
Assignee: Rich Schumacher
Priority: Trivial
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-821.patch


 Since installing and using ZooKeeper I've built and installed no less than 
 four versions of the zkpython bindings.  It would be really helpful if the 
 module had a '__version__' attribute to easily tell which version is 
 currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-784) server-side functionality for read-only mode

2010-06-23 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881746#action_12881746
 ] 

Henry Robinson commented on ZOOKEEPER-784:
--

I like the idea of fake sessions fine, although I think that the upgrade 
process might be complex. Another possibility is to do away with sessions in 
read-only mode (because they're mainly used to maintain state about watches, 
which don't make sense on a read-only server).

Sergey - just looked over your patch. Nice job! Couple of questions:

1. In QuorumPeer.java, I can't quite follow the logic in this part of the patch:

{code}
while (running) {
 switch (getPeerState()) {
 case LOOKING:
+LOG.info(LOOKING);
+ReadOnlyZooKeeperServer roZk = null;
 try {
-LOG.info(LOOKING);
+roZk = new ReadOnlyZooKeeperServer(
+logFactory, this,
+new ZooKeeperServer.BasicDataTreeBuilder(),
+this.zkDb);
+roZk.startup();
+
{code}

- is it sensible to start a ROZKServer every time a server enters the 'LOOKING' 
state, or should there be some kind of delay before it decides it is 
partitioned? Otherwise when a leader is lost and the quorum is doing a 
re-election, r/w clients that try and connect would get (I think) 'can't be 
read-only' messages .

2. What are you doing about watches? It seems to me that setting a watch turns 
a read operation into a read / write operation, and the client should be told 
that watch registration failed. If you can do this you don't have to worry so 
much about session migration because there's very little session state 
maintained by a ROZKServer on behalf of the client.

3. This patch has got to the point where it might be good if you started adding 
some tests to validate any further development you do. 


 server-side functionality for read-only mode
 

 Key: ZOOKEEPER-784
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-784
 Project: Zookeeper
  Issue Type: Sub-task
Reporter: Sergey Doroshenko
Assignee: Sergey Doroshenko
 Attachments: ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, 
 ZOOKEEPER-784.patch


 As per http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode , create 
 ReadOnlyZooKeeperServer which comes into play when peer is partitioned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-740) zkpython leading to segfault on zookeeper

2010-06-09 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877227#action_12877227
 ] 

Henry Robinson commented on ZOOKEEPER-740:
--

Mike - 

Great catch, thanks for figuring this out. 

I'm correct in saying that this doesn't prevent watchers from eventually being 
correctly freed, right? 

If so, then it would be great if you could submit this patch formally so that 
we can get it into trunk. See 
http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute for details.

Thanks,
Henry

 zkpython leading to segfault on zookeeper
 -

 Key: ZOOKEEPER-740
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-740
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
Reporter: Federico
Assignee: Henry Robinson
Priority: Critical
 Fix For: 3.4.0


 The program that we are implementing uses the python binding for zookeeper 
 but sometimes it crash with segfault; here is the bt from gdb:
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0xad244b70 (LWP 28216)]
 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
 at ../Objects/abstract.c:2488
 2488../Objects/abstract.c: No such file or directory.
 in ../Objects/abstract.c
 (gdb) bt
 #0  0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
 at ../Objects/abstract.c:2488
 #1  0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x862fab0,
 arg=0x8837194, kw=0x0) at ../Python/ceval.c:3575
 #2  0x080612a0 in PyObject_CallObject (o=0x862fab0, a=0x8837194)
 at ../Objects/abstract.c:2480
 #3  0x0047af42 in watcher_dispatch (zzh=0x86174e0, type=-1, state=1,
 path=0x86337c8 , context=0x8588660) at src/c/zookeeper.c:314
 #4  0x00496559 in do_foreach_watcher (zh=0x86174e0, type=-1, state=1,
 path=0x86337c8 , list=0xa5354140) at src/zk_hashtable.c:275
 #5  deliverWatchers (zh=0x86174e0, type=-1, state=1, path=0x86337c8 ,
 list=0xa5354140) at src/zk_hashtable.c:317
 #6  0x0048ae3c in process_completions (zh=0x86174e0) at src/zookeeper.c:1766
 #7  0x0049706b in do_completion (v=0x86174e0) at src/mt_adaptor.c:333
 #8  0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
 #9  0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-704) GSoC 2010: Read-Only Mode

2010-06-02 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson reassigned ZOOKEEPER-704:


Assignee: Sergey Doroshenko

 GSoC 2010: Read-Only Mode
 -

 Key: ZOOKEEPER-704
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-704
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Sergey Doroshenko

 Read-only mode
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java and TCP/IP networking
 Description
 When a ZooKeeper server loses contact with over half of the other servers in 
 an ensemble ('loses a quorum'), it stops responding to client requests 
 because it cannot guarantee that writes will get processed correctly. For 
 some applications, it would be beneficial if a server still responded to read 
 requests when the quorum is lost, but caused an error condition when a write 
 request was attempted.
 This project would implement a 'read-only' mode for ZooKeeper servers (maybe 
 only for Observers) that allowed read requests to be served as long as the 
 client can contact a server.
 This is a great project for getting really hands-on with the internals of 
 ZooKeeper - you must be comfortable with Java and networking otherwise you'll 
 have a hard time coming up to speed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-783) committedLog in ZKDatabase is not properly synchronized

2010-06-01 Thread Henry Robinson (JIRA)
committedLog in ZKDatabase is not properly synchronized
---

 Key: ZOOKEEPER-783
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-783
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
Reporter: Henry Robinson
Priority: Critical


ZKDatabase.getCommittedLog() returns a reference to the LinkedListProposal 
committedLog in ZKDatabase. This is then iterated over by at least one caller. 

I have seen a bug that causes a NPE in LinkedList.clear on committedLog, which 
I am pretty sure is due to the lack of synchronization. This bug has not been 
apparent in normal ZK operation, but in code that I have that starts and stops 
a ZK server in process repeatedly (clear() is called from 
ZooKeeperServerMain.shutdown()). 

It's better style to defensively copy the list in getCommittedLog, and to 
synchronize on the list in ZKDatabase.clear.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-783) committedLog in ZKDatabase is not properly synchronized

2010-06-01 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-783:
-

Attachment: ZOOKEEPER-783.patch

Defensive copying added to getCommittedLog() and synchronization during 
clear(). 

No tests added; really not sure how best to test for this. It does fix my test 
case but it's very difficult to distill that into a test (plus it only fails 
once in about 100 runs). 

 committedLog in ZKDatabase is not properly synchronized
 ---

 Key: ZOOKEEPER-783
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-783
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
Reporter: Henry Robinson
Priority: Critical
 Attachments: ZOOKEEPER-783.patch


 ZKDatabase.getCommittedLog() returns a reference to the LinkedListProposal 
 committedLog in ZKDatabase. This is then iterated over by at least one 
 caller. 
 I have seen a bug that causes a NPE in LinkedList.clear on committedLog, 
 which I am pretty sure is due to the lack of synchronization. This bug has 
 not been apparent in normal ZK operation, but in code that I have that starts 
 and stops a ZK server in process repeatedly (clear() is called from 
 ZooKeeperServerMain.shutdown()). 
 It's better style to defensively copy the list in getCommittedLog, and to 
 synchronize on the list in ZKDatabase.clear.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-21 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-769:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed this - thanks Sergey!

 Leader can treat observers as quorum members
 

 Key: ZOOKEEPER-769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment: Ubuntu Karmic x64
Reporter: Sergey Doroshenko
Assignee: Sergey Doroshenko
 Fix For: 3.4.0

 Attachments: follower.log, leader.log, observer.log, warning.patch, 
 zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch


 In short: it seems leader can treat observers as quorum members.
 Steps to repro:
 1. Server configuration: 3 voters, 2 observers (attached).
 2. Bring up 2 voters and one observer. It's enough for quorum.
 3. Shut down the one from the quorum who is the follower.
 As I understand, expected result is that leader will start a new election 
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and is 
 still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
 trying to regain a quorum).
 (Expectedly, if on step 3 we shut down the leader, not the follower, 
 remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [PATCH] javaclient: validate sessionTimeout field at ZooKeeper init (JIRA ZOOKEEPER-776)

2010-05-21 Thread Henry Robinson
Hi Greg -

Thanks very much for contributing! We've got some guidelines here:
http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute - let me know if
they're not clear.

The main thing for you to do is to attach your patch to the JIRA and click
the 'Licensed for inclusion into Apache projects' button when you do. You
can do this by clicking 'Attach patch' on the JIRA itself. Once you've done
that, please click 'Submit patch' to kick off our automated QA procedures.

Assuming all goes well, a committer will pick up the baton from there and
get the patch into trunk (or let you know if they think changes are
necessary).

Thanks!

Henry

On 21 May 2010 12:22, Gregory Haskins gregory.hask...@gmail.com wrote:

 Hi All,

 First patch submission for me.  If there are any patch submission
 guidelines I should follow, kindly point me at them and accept my
 apology if this approach violates any established procedures.  I didn't
 find anything obvious on the site wiki, so I just used some practices
 learned on other projects.

 -Greg

 

 commit 840f56d388582e1df39f7513aa7f4d4ce0610718
 Author: Gregory Haskins ghask...@novell.com
 Date:   Fri May 21 14:58:14 2010 -0400

javaclient: validate sessionTimeout field at ZooKeeper init

JIRA ZOOKEEPER-776 describes the following problem:

passing in a 0 sessionTimeout to ZooKeeper() constructor leads to
 errors
in subsequent operations. It would be ideal to capture this
 configuration
error at the source by throwing something like an IllegalArgument
 exception
when the bogus sessionTimeout is specified, instead of later when it is
utilized.

This patch is a proposal to fix the problem referenced above.

Applies to svn-id: 946074

Signed-off-by: Gregory Haskins ghask...@novell.com

 diff --git a/src/java/main/org/apache/zookeeper/ClientCnxn.java
 b/src/java/main/
 index 8eb227d..682811b 100644
 --- a/src/java/main/org/apache/zookeeper/ClientCnxn.java
 +++ b/src/java/main/org/apache/zookeeper/ClientCnxn.java
 @@ -353,6 +353,11 @@ public class ClientCnxn {
 this.sessionId = sessionId;
 this.sessionPasswd = sessionPasswd;

 +   if (sessionTimeout = 0) {
 +   throw new IOException(sessionTimeout  + sessionTimeout
 + +  is not valid);
 +   }
 +
 // parse out chroot, if any
 int off = hosts.indexOf('/');
 if (off = 0) {




-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


[jira] Commented: (ZOOKEEPER-776) API should sanity check sessionTimeout argument

2010-05-21 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870152#action_12870152
 ] 

Henry Robinson commented on ZOOKEEPER-776:
--

Thanks Greg - can you generate your patch from git with --no-prefix, to make it 
svn compatible?

 API should sanity check sessionTimeout argument
 ---

 Key: ZOOKEEPER-776
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-776
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client, java client
Affects Versions: 3.2.2, 3.3.0, 3.3.1
 Environment: OSX 10.6.3, JVM 1.6.0-20
Reporter: Gregory Haskins
Priority: Minor
 Fix For: 3.4.0

 Attachments: zookeeper-776-fix.patch


 passing in a 0 sessionTimeout to ZooKeeper() constructor leads to errors in 
 subsequent operations.  It would be ideal to capture this configuration error 
 at the source by throwing something like an IllegalArgument exception when 
 the bogus sessionTimeout is specified, instead of later when it is utilized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-776) API should sanity check sessionTimeout argument

2010-05-21 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870164#action_12870164
 ] 

Henry Robinson commented on ZOOKEEPER-776:
--

Cancelling the patch is fine but there's no need to delete it - Hudson will 
always figure out what the latest patch is and it's good to see how a ticket 
evolved.

Tests will also help :)

 API should sanity check sessionTimeout argument
 ---

 Key: ZOOKEEPER-776
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-776
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client, java client
Affects Versions: 3.2.2, 3.3.0, 3.3.1
 Environment: OSX 10.6.3, JVM 1.6.0-20
Reporter: Gregory Haskins
Priority: Minor
 Fix For: 3.4.0

 Attachments: zookeeper-776-fix.patch


 passing in a 0 sessionTimeout to ZooKeeper() constructor leads to errors in 
 subsequent operations.  It would be ideal to capture this configuration error 
 at the source by throwing something like an IllegalArgument exception when 
 the bogus sessionTimeout is specified, instead of later when it is utilized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-776) API should sanity check sessionTimeout argument

2010-05-21 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870179#action_12870179
 ] 

Henry Robinson commented on ZOOKEEPER-776:
--

Greg - 

Don't worry - you should have seen the hash I made of my first patch!

Hudson is misbehaving at the moment, so I'm not convinced that the test 
failures are as a result of your patch. You don't need to do anything right now 
- I'll take a look and update this ticket once I know what's going on.

cheers,
Henry

 API should sanity check sessionTimeout argument
 ---

 Key: ZOOKEEPER-776
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-776
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client, java client
Affects Versions: 3.2.2, 3.3.0, 3.3.1
 Environment: OSX 10.6.3, JVM 1.6.0-20
Reporter: Gregory Haskins
Priority: Minor
 Fix For: 3.4.0

 Attachments: zookeeper-776-fix.patch


 passing in a 0 sessionTimeout to ZooKeeper() constructor leads to errors in 
 subsequent operations.  It would be ideal to capture this configuration error 
 at the source by throwing something like an IllegalArgument exception when 
 the bogus sessionTimeout is specified, instead of later when it is utilized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-20 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-769:
-

Status: Open  (was: Patch Available)

 Leader can treat observers as quorum members
 

 Key: ZOOKEEPER-769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment: Ubuntu Karmic x64
Reporter: Sergey Doroshenko
Assignee: Sergey Doroshenko
 Fix For: 3.4.0

 Attachments: follower.log, leader.log, observer.log, warning.patch, 
 zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch


 In short: it seems leader can treat observers as quorum members.
 Steps to repro:
 1. Server configuration: 3 voters, 2 observers (attached).
 2. Bring up 2 voters and one observer. It's enough for quorum.
 3. Shut down the one from the quorum who is the follower.
 As I understand, expected result is that leader will start a new election 
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and is 
 still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
 trying to regain a quorum).
 (Expectedly, if on step 3 we shut down the leader, not the follower, 
 remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-20 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-769:
-

  Status: Patch Available  (was: Open)
Hadoop Flags: [Reviewed]

hudson? hello?

 Leader can treat observers as quorum members
 

 Key: ZOOKEEPER-769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment: Ubuntu Karmic x64
Reporter: Sergey Doroshenko
Assignee: Sergey Doroshenko
 Fix For: 3.4.0

 Attachments: follower.log, leader.log, observer.log, warning.patch, 
 zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch


 In short: it seems leader can treat observers as quorum members.
 Steps to repro:
 1. Server configuration: 3 voters, 2 observers (attached).
 2. Bring up 2 voters and one observer. It's enough for quorum.
 3. Shut down the one from the quorum who is the follower.
 As I understand, expected result is that leader will start a new election 
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and is 
 still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
 trying to regain a quorum).
 (Expectedly, if on step 3 we shut down the leader, not the follower, 
 remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-20 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869822#action_12869822
 ] 

Henry Robinson commented on ZOOKEEPER-769:
--

Failures do not look related to this patch (although I could be mistaken). 
ZkDatabaseCorruptionTest is the most recent broken test - passes fine for me 
locally?

 Leader can treat observers as quorum members
 

 Key: ZOOKEEPER-769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment: Ubuntu Karmic x64
Reporter: Sergey Doroshenko
Assignee: Sergey Doroshenko
 Fix For: 3.4.0

 Attachments: follower.log, leader.log, observer.log, warning.patch, 
 zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch


 In short: it seems leader can treat observers as quorum members.
 Steps to repro:
 1. Server configuration: 3 voters, 2 observers (attached).
 2. Bring up 2 voters and one observer. It's enough for quorum.
 3. Shut down the one from the quorum who is the follower.
 As I understand, expected result is that leader will start a new election 
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and is 
 still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
 trying to regain a quorum).
 (Expectedly, if on step 3 we shut down the leader, not the follower, 
 remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-18 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868780#action_12868780
 ] 

Henry Robinson commented on ZOOKEEPER-769:
--

Sergey - sorry for the delay. It's on me to review this patch, and then I'll 
commit it.

Thanks for your patience!

Henry

 Leader can treat observers as quorum members
 

 Key: ZOOKEEPER-769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment: Ubuntu Karmic x64
Reporter: Sergey Doroshenko
Assignee: Sergey Doroshenko
 Fix For: 3.4.0

 Attachments: follower.log, leader.log, observer.log, warning.patch, 
 zoo1.cfg, ZOOKEEPER-769.patch


 In short: it seems leader can treat observers as quorum members.
 Steps to repro:
 1. Server configuration: 3 voters, 2 observers (attached).
 2. Bring up 2 voters and one observer. It's enough for quorum.
 3. Shut down the one from the quorum who is the follower.
 As I understand, expected result is that leader will start a new election 
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and is 
 still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
 trying to regain a quorum).
 (Expectedly, if on step 3 we shut down the leader, not the follower, 
 remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-18 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-769:
-

Attachment: ZOOKEEPER-769.patch

I made a few small changes to your patch to make the logic a little easier to 
follow. Take a look and let me know if you think this is ok, otherwise I'll 
commit the patch tomorrow. Thanks!

Henry

 Leader can treat observers as quorum members
 

 Key: ZOOKEEPER-769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment: Ubuntu Karmic x64
Reporter: Sergey Doroshenko
Assignee: Sergey Doroshenko
 Fix For: 3.4.0

 Attachments: follower.log, leader.log, observer.log, warning.patch, 
 zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch


 In short: it seems leader can treat observers as quorum members.
 Steps to repro:
 1. Server configuration: 3 voters, 2 observers (attached).
 2. Bring up 2 voters and one observer. It's enough for quorum.
 3. Shut down the one from the quorum who is the follower.
 As I understand, expected result is that leader will start a new election 
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and is 
 still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
 trying to regain a quorum).
 (Expectedly, if on step 3 we shut down the leader, not the follower, 
 remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-18 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-769:
-

Status: Open  (was: Patch Available)

 Leader can treat observers as quorum members
 

 Key: ZOOKEEPER-769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment: Ubuntu Karmic x64
Reporter: Sergey Doroshenko
Assignee: Sergey Doroshenko
 Fix For: 3.4.0

 Attachments: follower.log, leader.log, observer.log, warning.patch, 
 zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch


 In short: it seems leader can treat observers as quorum members.
 Steps to repro:
 1. Server configuration: 3 voters, 2 observers (attached).
 2. Bring up 2 voters and one observer. It's enough for quorum.
 3. Shut down the one from the quorum who is the follower.
 As I understand, expected result is that leader will start a new election 
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and is 
 still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
 trying to regain a quorum).
 (Expectedly, if on step 3 we shut down the leader, not the follower, 
 remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-18 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-769:
-

Status: Patch Available  (was: Open)

 Leader can treat observers as quorum members
 

 Key: ZOOKEEPER-769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment: Ubuntu Karmic x64
Reporter: Sergey Doroshenko
Assignee: Sergey Doroshenko
 Fix For: 3.4.0

 Attachments: follower.log, leader.log, observer.log, warning.patch, 
 zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch


 In short: it seems leader can treat observers as quorum members.
 Steps to repro:
 1. Server configuration: 3 voters, 2 observers (attached).
 2. Bring up 2 voters and one observer. It's enough for quorum.
 3. Shut down the one from the quorum who is the follower.
 As I understand, expected result is that leader will start a new election 
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and is 
 still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
 trying to regain a quorum).
 (Expectedly, if on step 3 we shut down the leader, not the follower, 
 remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.

2010-05-17 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson reassigned ZOOKEEPER-772:


Assignee: Henry Robinson

 zkpython segfaults when watcher from async get children is invoked.
 ---

 Key: ZOOKEEPER-772
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
 Environment: ubuntu lucid (10.04) / zk trunk
Reporter: Kapil Thangavelu
Assignee: Henry Robinson
 Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff


 When utilizing the zkpython async get children api with a watch, i 
 consistently get segfaults when the watcher is invoked to process events. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.

2010-05-17 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-772:
-

Attachment: ZOOKEEPER-772.patch

Bug was simple when I got round to looking - was incorrectly reusing a watcher 
that was getting deallocated before getting called.

 zkpython segfaults when watcher from async get children is invoked.
 ---

 Key: ZOOKEEPER-772
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
 Environment: ubuntu lucid (10.04) / zk trunk
Reporter: Kapil Thangavelu
Assignee: Henry Robinson
 Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, 
 ZOOKEEPER-772.patch


 When utilizing the zkpython async get children api with a watch, i 
 consistently get segfaults when the watcher is invoked to process events. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.

2010-05-17 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-772:
-

Status: Patch Available  (was: Open)

 zkpython segfaults when watcher from async get children is invoked.
 ---

 Key: ZOOKEEPER-772
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
 Environment: ubuntu lucid (10.04) / zk trunk
Reporter: Kapil Thangavelu
Assignee: Henry Robinson
 Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, 
 ZOOKEEPER-772.patch


 When utilizing the zkpython async get children api with a watch, i 
 consistently get segfaults when the watcher is invoked to process events. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.

2010-05-17 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-772:
-

Status: Open  (was: Patch Available)

 zkpython segfaults when watcher from async get children is invoked.
 ---

 Key: ZOOKEEPER-772
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
 Environment: ubuntu lucid (10.04) / zk trunk
Reporter: Kapil Thangavelu
Assignee: Henry Robinson
 Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, 
 ZOOKEEPER-772.patch, ZOOKEEPER-772.patch


 When utilizing the zkpython async get children api with a watch, i 
 consistently get segfaults when the watcher is invoked to process events. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.

2010-05-17 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-772:
-

Status: Patch Available  (was: Open)

 zkpython segfaults when watcher from async get children is invoked.
 ---

 Key: ZOOKEEPER-772
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
 Environment: ubuntu lucid (10.04) / zk trunk
Reporter: Kapil Thangavelu
Assignee: Henry Robinson
 Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, 
 ZOOKEEPER-772.patch, ZOOKEEPER-772.patch


 When utilizing the zkpython async get children api with a watch, i 
 consistently get segfaults when the watcher is invoked to process events. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.

2010-05-17 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-772:
-

Attachment: ZOOKEEPER-772.patch

--no-prefix, predictably.

 zkpython segfaults when watcher from async get children is invoked.
 ---

 Key: ZOOKEEPER-772
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
 Environment: ubuntu lucid (10.04) / zk trunk
Reporter: Kapil Thangavelu
Assignee: Henry Robinson
 Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, 
 ZOOKEEPER-772.patch, ZOOKEEPER-772.patch


 When utilizing the zkpython async get children api with a watch, i 
 consistently get segfaults when the watcher is invoked to process events. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] Release ZooKeeper 3.3.1 (candidate 0)

2010-05-12 Thread Henry Robinson
+1, Java tests pass for me, as do Python ones.

Henry

On 11 May 2010 22:32, Patrick Hunt ph...@apache.org wrote:

 +1, tests pass for me, also verified that nc/zktop worked properly on a
 real cluster (4letter word fix).

 Patrick


 On 05/07/2010 11:25 AM, Patrick Hunt wrote:

 I've created a candidate build for ZooKeeper 3.3.1. This is a bug fix
 release addressing seventeen issues (one critical) -- see the release
 notes for details.

 *** Please download, test and VOTE before the
 *** vote closes 11am pacific time, Wednesday, May 12.***

 http://people.apache.org/~phunt/zookeeper-3.3.1-candidate-0/

 Should we release this?

 Patrick








-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


[jira] Commented: (ZOOKEEPER-679) Offers a node design for interacting with the Java Zookeeper client.

2010-05-09 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12865639#action_12865639
 ] 

Henry Robinson commented on ZOOKEEPER-679:
--

Hi Aaron - 

The great thing about open source, and the relatively permissive Apache license 
in particular, is that Chris is free to copy any and all of ZK into github and 
continue with a development process that he finds more agreeable. It is 
completely kosher to do this. As Chris says, you are welcome to contribute, 
fork or ignore it. 

As far as I am concerned, contrib is an excellent place to put projects that 
directly add more functionality to their parent project (the language bindings 
and this patch are good examples), but not a great place to store standalone 
projects that simply leverage the parent (an example might be a DNS server, 
written in ZooKeeper). This is a needfully vague distinction, and others will 
have different opinions.

I do not know specifically to what Chris is referring when he talks about an 
'onerous' patch process, but I speculate he might mean that the role of 
'committer' - someone who is gating the submission of patches - makes it harder 
to get your patches available for others to use quickly. Of course there are 
also benefits of this approach, such as a ready collection of experienced users 
on hand to offer advice and the relatively high standard for patches to be 
accepted to trunk arguably improves code quality. What's great is the two 
development styles are not mutually exclusive, and can, ideally, benefit from 
each other. If you are having difficulties with, or are frustrated by, the 
patch submission process here, ask for help. The community here is very happy 
to help, and we'll do what we can to address pain points. 

As for this patch, I'm happy it's going into contrib - users sometimes find 
ZooKeeper difficult to program to, and examples and new abstractions are always 
welcome. Keeping this patch in the main repository means that newcomers to 
ZooKeeper will find it more easily. Thanks for the contribution!

Henry

 Offers a node design for interacting with the Java Zookeeper client.
 

 Key: ZOOKEEPER-679
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-679
 Project: Zookeeper
  Issue Type: New Feature
  Components: contrib, java client, tests
Reporter: Aaron Crow
Assignee: Aaron Crow
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-679.patch, ZOOKEEPER-679.patch, 
 ZOOKEEPER-679.patch, ZOOKEEPER-679.patch


 Following up on my conversations with Patrick and Mahadev 
 (http://n2.nabble.com/Might-I-contribute-a-Node-design-for-the-Java-API-td4567695.html#a4567695).
 This patch includes the implementation as well as unit tests. The first unit 
 test gives a simple high level demo of using the node API.
 The current implementation is simple and is only what I need withe current 
 project I am working on. However, I am very open to any and all suggestions 
 for improvement.
 This is a proposal to support a simplified node (or File) like API into a 
 Zookeeper tree, by wrapping the Zookeeper Java client. It is similar to 
 Java's File API design.
 Although, I'm trying to make it easier in a few spots. For example, deleting 
 a Node recursively is done by default. I also lean toward resolving 
 Exceptions under the hood when it seems appropriate. For example, if you 
 ask a Node if it exists, and its parent doesn't even exist, you just get a 
 false back (rather than a nasty Exception).
 As for watches and ephemeral nodes, my current work does not need these 
 things so I currently have no handling of them. But if potential users of  
 the Node a.k.a. File design want these things, I'd be open to supporting 
 them as reasonable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-07 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12865240#action_12865240
 ] 

Henry Robinson commented on ZOOKEEPER-769:
--

Sergey - 

Great, thanks for making this patch! ISTR there was some reason why we didn't 
infer peerType from the servers list, but I can't remember what it was...

As for your patch, a few small comments:

1. Use --no-prefix and just attach the output of git-diff (no mail headers etc) 
- Hudson is rather picky about the patch formats it can apply
2. It would be great to include a test that reads a configuration and checks 
that the behaviour is correct
3. If the peerTypes don't match up, should we default to the server list (on 
the assumption that that will be consistent across all servers)?
4. Once you've added the patch, click 'submit patch' to start Hudson moving.

cheers,
Henry

 Leader can treat observers as quorum members
 

 Key: ZOOKEEPER-769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment: Ubuntu Karmic x64
Reporter: Sergey Doroshenko
Assignee: Sergey Doroshenko
 Fix For: 3.4.0

 Attachments: follower.log, leader.log, observer.log, warning.patch, 
 zoo1.cfg


 In short: it seems leader can treat observers as quorum members.
 Steps to repro:
 1. Server configuration: 3 voters, 2 observers (attached).
 2. Bring up 2 voters and one observer. It's enough for quorum.
 3. Shut down the one from the quorum who is the follower.
 As I understand, expected result is that leader will start a new election 
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and is 
 still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
 trying to regain a quorum).
 (Expectedly, if on step 3 we shut down the leader, not the follower, 
 remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Question on quorum behavior

2010-05-06 Thread Henry Robinson
Sergey -

Sounds like a bug. Can you open a new JIRA and attach your log files to it?

Thanks,
Henry

On 6 May 2010 07:50, Sergey Doroshenko dors...@gmail.com wrote:

 In short: it seems leader can treat observers as quorum members.

 Steps to repro:

 1. I have a following ensemble configuration:
 # servers list
 server.1=localhost:2881:3881
 server.2=localhost:2882:3882
 server.3=localhost:2883:3883:observer
 server.4=localhost:2884:3884
 server.5=localhost:2885:3885:observer

 2. I'm bringing up servers 1,2,3 and it's enough for quorum (1 and 2).
 3. I'm shutting down the one from the quorum who is the follower.

 As I understand, expected result is that leader will start a new election
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and
 is
 still operable. (When I'm shutting down 3rd one -- observer -- leader
 starts
 trying to regain a quorum).

 Is this a bug, or a feature?


 --
 Regards, Sergey




-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


[jira] Commented: (ZOOKEEPER-768) zkpython segfault on close (assertion error in io thread)

2010-05-06 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864849#action_12864849
 ] 

Henry Robinson commented on ZOOKEEPER-768:
--

Thanks Kapil - I'll take a look. From the stack trace it looks as though a 
pending completion callback is null and therefore something weird is going on 
with a completion dispatcher being freed before it is finished being used. As 
per usual I can't reproduce on my machine, but this is enough information to 
dig into it. 

 zkpython segfault on close (assertion error in io thread)
 -

 Key: ZOOKEEPER-768
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-768
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.4.0
 Environment: ubuntu lucid (10.04), zookeeper trunk (java/c/zkpython)
Reporter: Kapil Thangavelu
 Attachments: zkpython-segfault-client-log.txt, 
 zkpython-segfault-stack-traces.txt, zkpython-segfault.py


 While trying to create a test case showing slow average add_auth, i stumbled 
 upon a test case that reliably segfaults for me, albeit with variable amount 
 of iterations (anwhere from 0 to 20 typically). fwiw, I've got about 220 
 processes in my test environment (ubuntu lucid 10.04). The test case opens a 
 connection, adds authentication to it, and closes the connection, in a loop. 
 I'm including the sample program and the gdb stack traces from the core file. 
 I can upload the core file if thats helpful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-06 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864878#action_12864878
 ] 

Henry Robinson commented on ZOOKEEPER-769:
--

Hi Sergey - 

Can you attach the logs from (at least) the leader node to this ticket? I'd 
like to figure this one out asap.

cheers,
Henry

 Leader can treat observers as quorum members
 

 Key: ZOOKEEPER-769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment: Ubuntu Karmic x64
Reporter: Sergey Doroshenko
 Fix For: 3.3.0

 Attachments: zoo1.cfg


 In short: it seems leader can treat observers as quorum members.
 Steps to repro:
 1. Server configuration: 3 voters, 2 observers (attached).
 2. Bring up 2 voters and one observer. It's enough for quorum.
 3. Shut down the one from the quorum who is the follower.
 As I understand, expected result is that leader will start a new election 
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and is 
 still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
 trying to regain a quorum).
 (Expectedly, if on step 3 we shut down the leader, not the follower, 
 remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-06 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864953#action_12864953
 ] 

Henry Robinson commented on ZOOKEEPER-769:
--

Sergey - 

In the cfg files for nodes 3 and 5, did you include the following line? 

peerType=observer

See http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperObservers.html for 
details. The observer log contains this line:

2010-05-06 22:46:00,876 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2183:quorump...@642] - FOLLOWING

which is a big red flag because observers should never adopt the FOLLOWING 
state. 

If I don't have that line I can reproduce your issue. If I add it, the 
observers work as expected. Can you check your cfg files?

cheers,
Henry

 Leader can treat observers as quorum members
 

 Key: ZOOKEEPER-769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment: Ubuntu Karmic x64
Reporter: Sergey Doroshenko
 Fix For: 3.3.0

 Attachments: follower.log, leader.log, observer.log, zoo1.cfg


 In short: it seems leader can treat observers as quorum members.
 Steps to repro:
 1. Server configuration: 3 voters, 2 observers (attached).
 2. Bring up 2 voters and one observer. It's enough for quorum.
 3. Shut down the one from the quorum who is the follower.
 As I understand, expected result is that leader will start a new election 
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and is 
 still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
 trying to regain a quorum).
 (Expectedly, if on step 3 we shut down the leader, not the follower, 
 remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864429#action_12864429
 ] 

Henry Robinson commented on ZOOKEEPER-763:
--

Hi Kapil - 

As seems to be the norm for me this week, I'm struggling to reproduce :) It 
does seem like your python script explicitly waits for a completion to be 
called before closing a handle. Is this enough to leave an outstanding 
completion on the queue?

Can you capture the stacktrace for the completion thread? I think it must be 
getting stuck in process_completions but it would be very valuable to know 
where - if it's stuck on the callback into zkpython then that means the 
deadlock is in the python bindings and not solely in C-land.

cheers,
Henry

 Deadlock on close w/ zkpython / c client
 

 Key: ZOOKEEPER-763
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, contrib-bindings
Affects Versions: 3.3.0
 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
Reporter: Kapil Thangavelu
Assignee: Mahadev konar
 Fix For: 3.4.0

 Attachments: deadlock.py, stack-trace-deadlock.txt


 deadlocks occur if we attempt to close a handle while there are any 
 outstanding async requests (aget, acreate, etc). Normally on close both the 
 io thread terminates and the completion thread are terminated and joined, 
 however w\ith outstanding async requests, the completion thread won't be in a 
 joinable state, and we effectively hang when the main thread does the join.
 afaics ideal behavior would be on close of a handle, to effectively clear out 
 any remaining callbacks and let the completion thread terminate.
 i've tried adding some bookkeeping to within a python client to guard against 
 closing while there is an outstanding async completion request, but its an 
 imperfect solution since even after the python callback is executed there is 
 still a window for deadlock before the completion thread finishes the 
 callback.
 a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Demo Code: Shared/Exclusive Lock

2010-05-05 Thread Henry Robinson
Sam -

This is great - the more contributed code the better!

Did you attach the code to your mail? The mailing lists strip out
attachments. If you wouldn't mind creating a JIRA (see
https://issues.apache.org/jira/browse/ZOOKEEPER), formatting your code as a
patch and clicking the button that says you're happy for the ASF to use your
code, that would be awesome - doing so makes it easier for us to add your
code into Apache-hosted source repositories.

Thanks again for your contribution - really pleased to see it.

cheers,
Henry

On 5 May 2010 13:06, Sam Baskinger sam.baskin...@networkedinsights.comwrote:

 All,

 It was suggested that more demo code would be welcome. I've gotten the OK
 to release a shared/exclusive Lock.java implementation we have in our test
 labs at Networked Insights. If the community would find it useful, please do
 use it! :)

 All the best, and thanks for the excellent tool,


 *Sam Baskinger
 *Software Engineer
 Networked Insights
 http://www.networkedinsights.com




-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


[jira] Commented: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864488#action_12864488
 ] 

Henry Robinson commented on ZOOKEEPER-763:
--

Kapil - 

Thanks! Adding that sleep helped me understand what was going on. 

pyzoo_close has the GIL but blocks inside zookeeper_close, waiting for the 
completion thread to finish. However, if a completion is still inside Python, 
but has been pre-empted by the main thread which calls pyzoo_close, the 
completion can't get the GIL back to finish up executing, blocking the 
completions_thread for ever more. The fix is simple - relinquish the GIL during 
the zookeeper_close call, and then reacquire it straight after. There are even 
handy macros to do this:

Py_BEGIN_ALLOW_THREADS
ret = zookeeper_close(zhandles[zkhid]);
Py_END_ALLOW_THREADS

This same issue will affect any part of zkpython where a call to the C client 
is blocked on some work being completed in another Python thread - in practice, 
I think this means from callbacks. I'll audit the code to see if any other API 
calls are affected. Patch to fix this issue is following shortly - Kapil, I'd 
be very grateful if you could help us by testing it. 

cheers,
Henry

 Deadlock on close w/ zkpython / c client
 

 Key: ZOOKEEPER-763
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, contrib-bindings
Affects Versions: 3.3.0
 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
Reporter: Kapil Thangavelu
Assignee: Mahadev konar
 Fix For: 3.4.0

 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt


 deadlocks occur if we attempt to close a handle while there are any 
 outstanding async requests (aget, acreate, etc). Normally on close both the 
 io thread terminates and the completion thread are terminated and joined, 
 however w\ith outstanding async requests, the completion thread won't be in a 
 joinable state, and we effectively hang when the main thread does the join.
 afaics ideal behavior would be on close of a handle, to effectively clear out 
 any remaining callbacks and let the completion thread terminate.
 i've tried adding some bookkeeping to within a python client to guard against 
 closing while there is an outstanding async completion request, but its an 
 imperfect solution since even after the python callback is executed there is 
 still a window for deadlock before the completion thread finishes the 
 callback.
 a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-763:
-

 Assignee: Henry Robinson  (was: Mahadev konar)
Fix Version/s: 3.3.1
  Component/s: (was: c client)

 Deadlock on close w/ zkpython / c client
 

 Key: ZOOKEEPER-763
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.0
 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
Reporter: Kapil Thangavelu
Assignee: Henry Robinson
 Fix For: 3.3.1, 3.4.0

 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt


 deadlocks occur if we attempt to close a handle while there are any 
 outstanding async requests (aget, acreate, etc). Normally on close both the 
 io thread terminates and the completion thread are terminated and joined, 
 however w\ith outstanding async requests, the completion thread won't be in a 
 joinable state, and we effectively hang when the main thread does the join.
 afaics ideal behavior would be on close of a handle, to effectively clear out 
 any remaining callbacks and let the completion thread terminate.
 i've tried adding some bookkeeping to within a python client to guard against 
 closing while there is an outstanding async completion request, but its an 
 imperfect solution since even after the python callback is executed there is 
 still a window for deadlock before the completion thread finishes the 
 callback.
 a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-763:
-

Status: Patch Available  (was: Open)

 Deadlock on close w/ zkpython / c client
 

 Key: ZOOKEEPER-763
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.0
 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
Reporter: Kapil Thangavelu
Assignee: Henry Robinson
 Fix For: 3.3.1, 3.4.0

 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, 
 ZOOKEEPER-763.patch


 deadlocks occur if we attempt to close a handle while there are any 
 outstanding async requests (aget, acreate, etc). Normally on close both the 
 io thread terminates and the completion thread are terminated and joined, 
 however w\ith outstanding async requests, the completion thread won't be in a 
 joinable state, and we effectively hang when the main thread does the join.
 afaics ideal behavior would be on close of a handle, to effectively clear out 
 any remaining callbacks and let the completion thread terminate.
 i've tried adding some bookkeeping to within a python client to guard against 
 closing while there is an outstanding async completion request, but its an 
 imperfect solution since even after the python callback is executed there is 
 still a window for deadlock before the completion thread finishes the 
 callback.
 a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-763:
-

Attachment: ZOOKEEPER-763.patch

Forgot --no-prefix again :/

 Deadlock on close w/ zkpython / c client
 

 Key: ZOOKEEPER-763
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.0
 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
Reporter: Kapil Thangavelu
Assignee: Henry Robinson
 Fix For: 3.3.1, 3.4.0

 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, 
 ZOOKEEPER-763.patch, ZOOKEEPER-763.patch


 deadlocks occur if we attempt to close a handle while there are any 
 outstanding async requests (aget, acreate, etc). Normally on close both the 
 io thread terminates and the completion thread are terminated and joined, 
 however w\ith outstanding async requests, the completion thread won't be in a 
 joinable state, and we effectively hang when the main thread does the join.
 afaics ideal behavior would be on close of a handle, to effectively clear out 
 any remaining callbacks and let the completion thread terminate.
 i've tried adding some bookkeeping to within a python client to guard against 
 closing while there is an outstanding async completion request, but its an 
 imperfect solution since even after the python callback is executed there is 
 still a window for deadlock before the completion thread finishes the 
 callback.
 a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-763:
-

Status: Patch Available  (was: Open)

 Deadlock on close w/ zkpython / c client
 

 Key: ZOOKEEPER-763
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.0
 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
Reporter: Kapil Thangavelu
Assignee: Henry Robinson
 Fix For: 3.3.1, 3.4.0

 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, 
 ZOOKEEPER-763.patch, ZOOKEEPER-763.patch


 deadlocks occur if we attempt to close a handle while there are any 
 outstanding async requests (aget, acreate, etc). Normally on close both the 
 io thread terminates and the completion thread are terminated and joined, 
 however w\ith outstanding async requests, the completion thread won't be in a 
 joinable state, and we effectively hang when the main thread does the join.
 afaics ideal behavior would be on close of a handle, to effectively clear out 
 any remaining callbacks and let the completion thread terminate.
 i've tried adding some bookkeeping to within a python client to guard against 
 closing while there is an outstanding async completion request, but its an 
 imperfect solution since even after the python callback is executed there is 
 still a window for deadlock before the completion thread finishes the 
 callback.
 a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-763:
-

Status: Open  (was: Patch Available)

 Deadlock on close w/ zkpython / c client
 

 Key: ZOOKEEPER-763
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.0
 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
Reporter: Kapil Thangavelu
Assignee: Henry Robinson
 Fix For: 3.3.1, 3.4.0

 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, 
 ZOOKEEPER-763.patch, ZOOKEEPER-763.patch


 deadlocks occur if we attempt to close a handle while there are any 
 outstanding async requests (aget, acreate, etc). Normally on close both the 
 io thread terminates and the completion thread are terminated and joined, 
 however w\ith outstanding async requests, the completion thread won't be in a 
 joinable state, and we effectively hang when the main thread does the join.
 afaics ideal behavior would be on close of a handle, to effectively clear out 
 any remaining callbacks and let the completion thread terminate.
 i've tried adding some bookkeeping to within a python client to guard against 
 closing while there is an outstanding async completion request, but its an 
 imperfect solution since even after the python callback is executed there is 
 still a window for deadlock before the completion thread finishes the 
 callback.
 a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-764) Observer elected leader due to inconsistent voting view

2010-05-05 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-764:
-

Attachment: ZOOKEEPER-764_3_3_1.patch

Patch to apply against 3_3_1

 Observer elected leader due to inconsistent voting view
 ---

 Key: ZOOKEEPER-764
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-764
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Fix For: 3.3.1, 3.4.0

 Attachments: ZOOKEEPER-690.patch, ZOOKEEPER-764_3_3_1.patch


 In ZOOKEEPER-690, we noticed that an observer was being elected, and Henry 
 proposed a patch to fix the issue. However, it seems that the patch does not 
 solve the issue one user (Alan Cabrera) has observed. Given that we would 
 like to fix this issue, and to work separately with Alan to determine the 
 problem with his setup, I'm creating this jira and re-posting Henry's patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-05-04 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863902#action_12863902
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

Hi Alan - 

Looking at this attachment: nohup-AsyncHammerTest-201004301209.txt - the tests 
appear to be run twice. The first testObserversHammer completes successfully, 
the second fails. Were you running the tests until you experienced the failure? 

Henry

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
 jstack-201004291527.txt, jstack-AsyncHammerTest-201004301209.txt, 
 nohup-201004201053.txt, nohup-201004291409.txt, nohup-201004291527.txt, 
 nohup-AsyncHammerTest-201004301209.txt, 
 nohup-QuorumPeerMainTest-201004301209.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-05-04 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863915#action_12863915
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

Weird - it looks like the test is shutting down correctly:


[junit] 2010-04-30 11:41:52,896 - INFO  [main:clientb...@222] - connecting to 
127.0.0.1 11233
[junit] 2010-04-30 11:41:52,896 - INFO  [main:quorumb...@277] - 
127.0.0.1:11233 is no longer accepting client connections
[junit] 2010-04-30 11:41:52,896 - INFO  [main:clientb...@222] - connecting 
to 127.0.0.1 11234
[junit] 2010-04-30 11:41:52,897 - INFO  [main:quorumb...@277] - 
127.0.0.1:11234 is no longer accepting client connections
[junit] 2010-04-30 11:41:52,897 - INFO  [main:clientb...@222] - connecting 
to 127.0.0.1 11235
[junit] 2010-04-30 11:41:52,897 - INFO  [main:quorumb...@277] - 
127.0.0.1:11235 is no longer accepting client connections
[junit] 2010-04-30 11:41:52,897 - INFO  [main:clientb...@222] - connecting 
to 127.0.0.1 11236
[junit] 2010-04-30 11:41:52,898 - INFO  [main:quorumb...@277] - 
127.0.0.1:11236 is no longer accepting client connections
[junit] 2010-04-30 11:41:52,898 - INFO  [main:clientb...@222] - connecting 
to 127.0.0.1 11237
[junit] 2010-04-30 11:41:52,898 - INFO  [main:quorumb...@277] - 
127.0.0.1:11237 is no longer accepting client connections
[junit] 2010-04-30 11:41:52,901 - INFO  
[main:junit4zktestrunner$loggedinvokemet...@56] - FINISHED TEST METHOD 
testObserversHammer
[junit] 2010-04-30 11:41:52,901 - INFO  [main:zktestcas...@59] - SUCCEEDED 
testObserversHammer
[junit] 2010-04-30 11:41:52,901 - INFO  [main:zktestcas...@54] - FINISHED 
testObserversHammer

and then it goes into trying the C tests which fail for an unrelated reason - 
does it lock up at this point or does it actually fail out to the CLI? If it 
locks up, is the jstack output you attached from that run?



 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
 jstack-201004291527.txt, jstack-AsyncHammerTest-201004301209.txt, 
 nohup-201004201053.txt, nohup-201004291409.txt, nohup-201004291527.txt, 
 nohup-AsyncHammerTest-201004301209.txt, 
 nohup-QuorumPeerMainTest-201004301209.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: ZOOKEEPER-107 - Allow dynamic changes to server cluster membership

2010-05-03 Thread Henry Robinson
Hi Vishal -

Great that you're interested in contributing! This would be a really neat
feature to get into ZK.

The documentation that exists is essentially all on the JIRA. I had a patch
that 'worked' but was nowhere near commit-ready. I'm trying to dig it up,
but it appears it may have gone to the great bit-bucket in the sky. Trunk
has moved sufficiently that a new patch would be required anyhow.

There were two main difficulties with this issue. The first is changing the
voting protocol to cope with changes in views. Since proposals are
pipelined, the leader needs to keep track of what the view was that should
vote for a proposal. IIRC, the other subtlety is making sure that when a
view change is proposed, a quorum of votes is received from both the
outgoing view and the incoming one. Otherwise it's possible to transition to
a 'dead' view in which no progress can be made.

The second is to figure out the metadata management - how do we 'find'
ZooKeeper servers if the ensemble may have moved onto a completely separate
set of machines? That is, if the original ensemble was on A, B, C and the
current ensemble is D, E, F - where do we look to find where the ensemble is
located?

The first is a solved issue, the second is more a matter of taste than
designing distributed protocols.

Really happy to help with this issue - I'd love to see it get resurrected.

cheers,
Henry

On 3 May 2010 07:25, Vishal K vishalm...@gmail.com wrote:

 Hi Henry,

 I just commented on the Jira. I would be happy to contribute.
 Please advise on the current status and next steps. Thanks.

 Regards,
 -Vishal




-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


Re: ZOOKEEPER-107 - Allow dynamic changes to server cluster membership

2010-05-03 Thread Henry Robinson
Hi Vishal -

That's right - design, not implementation!

I'd encourage you to share a design document once you feel you understand
exactly what's required. This is probably going to be complex patch and
reviewers will need a study guide :)

cheers,
Henry

On 3 May 2010 10:26, Vishal Kher vishalm...@gmail.com wrote:

 Hi Henry,

 Thanks for the info. I will spend some more time to understand the issues
 before starting with the implementation. I will let you know if I have any
 questions (which I am sure I will).

 Just to clarify, by solved issue you mean from design perspective and not
 from implementation right?
 Regards,
 -Vishal
 On Mon, May 3, 2010 at 1:16 PM, Henry Robinson he...@cloudera.com wrote:

  Hi Vishal -
 
  Great that you're interested in contributing! This would be a really neat
  feature to get into ZK.
 
  The documentation that exists is essentially all on the JIRA. I had a
 patch
  that 'worked' but was nowhere near commit-ready. I'm trying to dig it up,
  but it appears it may have gone to the great bit-bucket in the sky. Trunk
  has moved sufficiently that a new patch would be required anyhow.
 
  There were two main difficulties with this issue. The first is changing
 the
  voting protocol to cope with changes in views. Since proposals are
  pipelined, the leader needs to keep track of what the view was that
 should
  vote for a proposal. IIRC, the other subtlety is making sure that when a
  view change is proposed, a quorum of votes is received from both the
  outgoing view and the incoming one. Otherwise it's possible to transition
  to
  a 'dead' view in which no progress can be made.
 
  The second is to figure out the metadata management - how do we 'find'
  ZooKeeper servers if the ensemble may have moved onto a completely
 separate
  set of machines? That is, if the original ensemble was on A, B, C and the
  current ensemble is D, E, F - where do we look to find where the ensemble
  is
  located?
 
  The first is a solved issue, the second is more a matter of taste than
  designing distributed protocols.
 
  Really happy to help with this issue - I'd love to see it get
 resurrected.
 
  cheers,
  Henry
 
  On 3 May 2010 07:25, Vishal K vishalm...@gmail.com wrote:
 
   Hi Henry,
  
   I just commented on the Jira. I would be happy to contribute.
   Please advise on the current status and next steps. Thanks.
  
   Regards,
   -Vishal
  
 
 
 
  --
  Henry Robinson
  Software Engineer
  Cloudera
  415-994-6679
 




-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


Re: Dynamic adding/removing ZK servers on client

2010-05-03 Thread Henry Robinson
On 3 May 2010 16:40, Dave Wright wrig...@gmail.com wrote:

  Should this be a znode in the privileged namespace?
 

 I think having a znode for the current cluster members is part of the
 ZOOKEEPER-107 proposal, with the idea being that you could get/set the
 membership just by writing to that node. On the client side, you could
 watch that znode and update your server list when it changes.



This is tricky: what happens if the server your client is connected to is
decommissioned by a view change, and you are unable to locate another server
to connect to because other view changes committed while you are
reconnecting have removed all the servers you knew about. We'd need to make
sure that watches on this znode were fired before a view change, but it's
hard to know how to avoid having to wait for a session timeout before a
client that might just be migrating servers reappears in order to make sure
it sees the veiw change.

Even then, the problem of 'locating' the cluster still exists in the case
that there are no clients connected to tell anyone about it.

Henry


-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key

2010-04-30 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-758:
-

Attachment: ZOOKEEPER-758.patch

Kapil - 

Thanks for the patch! Unfortunately it didn't apply cleanly against trunk 
because I think you had added 'test_acl_validity' to acl_test.py which was not 
included in the diff.

I'm attaching a patch that applies cleanly to trunk - no code changes from your 
patch.

Thanks,

Henry

 zkpython segfaults on invalid acl with missing key
 --

 Key: ZOOKEEPER-758
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.0, 3.4.0
 Environment: ubuntu lucid (10.04)
Reporter: Kapil Thangavelu
 Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch


 Currently when setting an acl, there is a minimal parse to ensure that its a 
 list of dicts, however if one of the dicts is missing a required key, the 
 subsequent usage doesn't check for it, and will segfault.. for example using 
 an acl of [{schema:id, id:world, permissions:PERM_ALL}] will segfault if 
 used, because the scheme key is missing (its been purposefully typo'd to 
 schema in example). 
 I've expanded the check_acl macro to include verifying that all keys are 
 present and added some unit tests against trunk in the attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key

2010-04-30 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-758:
-

  Status: Patch Available  (was: Open)
Hadoop Flags: [Reviewed]

I have reviewed this, and it looks good. Thanks Kapil!

 zkpython segfaults on invalid acl with missing key
 --

 Key: ZOOKEEPER-758
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.0, 3.4.0
 Environment: ubuntu lucid (10.04)
Reporter: Kapil Thangavelu
 Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch


 Currently when setting an acl, there is a minimal parse to ensure that its a 
 list of dicts, however if one of the dicts is missing a required key, the 
 subsequent usage doesn't check for it, and will segfault.. for example using 
 an acl of [{schema:id, id:world, permissions:PERM_ALL}] will segfault if 
 used, because the scheme key is missing (its been purposefully typo'd to 
 schema in example). 
 I've expanded the check_acl macro to include verifying that all keys are 
 present and added some unit tests against trunk in the attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key

2010-04-30 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-758:
-

Attachment: ZOOKEEPER-758.patch

forgot --no-prefix.

 zkpython segfaults on invalid acl with missing key
 --

 Key: ZOOKEEPER-758
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.0, 3.4.0
 Environment: ubuntu lucid (10.04)
Reporter: Kapil Thangavelu
 Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch, 
 ZOOKEEPER-758.patch


 Currently when setting an acl, there is a minimal parse to ensure that its a 
 list of dicts, however if one of the dicts is missing a required key, the 
 subsequent usage doesn't check for it, and will segfault.. for example using 
 an acl of [{schema:id, id:world, permissions:PERM_ALL}] will segfault if 
 used, because the scheme key is missing (its been purposefully typo'd to 
 schema in example). 
 I've expanded the check_acl macro to include verifying that all keys are 
 present and added some unit tests against trunk in the attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key

2010-04-30 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-758:
-

Status: Patch Available  (was: Open)

 zkpython segfaults on invalid acl with missing key
 --

 Key: ZOOKEEPER-758
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.0, 3.4.0
 Environment: ubuntu lucid (10.04)
Reporter: Kapil Thangavelu
 Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch, 
 ZOOKEEPER-758.patch


 Currently when setting an acl, there is a minimal parse to ensure that its a 
 list of dicts, however if one of the dicts is missing a required key, the 
 subsequent usage doesn't check for it, and will segfault.. for example using 
 an acl of [{schema:id, id:world, permissions:PERM_ALL}] will segfault if 
 used, because the scheme key is missing (its been purposefully typo'd to 
 schema in example). 
 I've expanded the check_acl macro to include verifying that all keys are 
 present and added some unit tests against trunk in the attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key

2010-04-30 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-758:
-

Status: Open  (was: Patch Available)

 zkpython segfaults on invalid acl with missing key
 --

 Key: ZOOKEEPER-758
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.0, 3.4.0
 Environment: ubuntu lucid (10.04)
Reporter: Kapil Thangavelu
 Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch, 
 ZOOKEEPER-758.patch


 Currently when setting an acl, there is a minimal parse to ensure that its a 
 list of dicts, however if one of the dicts is missing a required key, the 
 subsequent usage doesn't check for it, and will segfault.. for example using 
 an acl of [{schema:id, id:world, permissions:PERM_ALL}] will segfault if 
 used, because the scheme key is missing (its been purposefully typo'd to 
 schema in example). 
 I've expanded the check_acl macro to include verifying that all keys are 
 present and added some unit tests against trunk in the attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key

2010-04-30 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-758:
-

   Status: Resolved  (was: Patch Available)
Fix Version/s: 3.3.1
   3.4.0
   Resolution: Fixed

I just committed this. Thanks Kapil!

 zkpython segfaults on invalid acl with missing key
 --

 Key: ZOOKEEPER-758
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.0, 3.4.0
 Environment: ubuntu lucid (10.04)
Reporter: Kapil Thangavelu
 Fix For: 3.3.1, 3.4.0

 Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch, 
 ZOOKEEPER-758.patch


 Currently when setting an acl, there is a minimal parse to ensure that its a 
 list of dicts, however if one of the dicts is missing a required key, the 
 subsequent usage doesn't check for it, and will segfault.. for example using 
 an acl of [{schema:id, id:world, permissions:PERM_ALL}] will segfault if 
 used, because the scheme key is missing (its been purposefully typo'd to 
 schema in example). 
 I've expanded the check_acl macro to include verifying that all keys are 
 present and added some unit tests against trunk in the attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-690:
-

Attachment: ZOOKEEPER-690.patch

I have found what I hope is the problem.

Because QuorumPeers duplicate their 'LearnerType' in two places there's the 
possibility that they may get out of sync. This is what was happening here - it 
was a test bug. Although the Observers knew that they were Observers, the other 
nodes did not. This affected the leader election protocol as other node did not 
know to reject an Observer.

I feel like we should refactor the QuorumPeer.QuorumServer code so as not to 
duplicate information, but for the time being I think this patch will work. 

I have also taken the opportunity to standardise the naming of 'learnertype' 
throughout the code (in some places it was called 'peertype' adding to the 
confusion).

Tests pass on my machine, but I can't guarantee that the problem is fixed as I 
could never recreate the error.

Thanks to Flavio for catching the broken invariant!

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, nohup-201004201053.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862351#action_12862351
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

Alan - can you try this patch to see if it fixes things? 

Thanks, 

Henry


 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, nohup-201004201053.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-690:
-

Status: Patch Available  (was: Open)

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, nohup-201004201053.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862424#action_12862424
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

This map is, I think, shared between the quorumpeers for the purposes of the 
test (and in general there aren't two quorumpeers sharing this datastructure 
when running normally). 

But! The error here is that I'm dumb (and that Java's type-checking leaves a 
little to be desired). I've written quorumPeers.containsValue up there, but 
actually it should be quorumPeers.containsKey. New patch on the way, let's see 
if that fixes it.

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
 nohup-201004201053.txt, nohup-201004291409.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-690:
-

Attachment: ZOOKEEPER-690.patch

Alan - would you mind trying this new patch? Thanks for your patience. I 
suspect that something might still be a bit flaky with these tests (not the 
code, but the tests), but I hope this will fix this particular problem. 

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
 nohup-201004201053.txt, nohup-201004291409.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch, ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862482#action_12862482
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

Ben - 

Agreed. I see this as the same as setMyid(...) - it sets an immutable value and 
should only be called once. I'd prefer if these parameters were 'final' in 
QuorumPeer and set in the constructor, but that's not the way that 
runFromConfig (the only place outside of tests that these methods are called) 
is written. Then we could get rid of setLearnerType, for sure. 

The real error here, I think, is duplicating the learnertype between QuorumPeer 
and QuorumServer. If we are going to have the list of QuorumServers, then 
getLearnerType should lookup the learner type in the peer map. Same for the 
serverid, perhaps, and we should just save a reference to the QuorumServer that 
represents our Quorumpeer. 


 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
 jstack-201004291527.txt, nohup-201004201053.txt, nohup-201004291409.txt, 
 nohup-201004291527.txt, TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, 
 zoo.log, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-28 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861865#action_12861865
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

Progress update - possibly to do with a bug in FLE allowing an Observer to be 
elected. We're looking into this now.

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, nohup-201004201053.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-749) OSGi metadata not included in binary only jar

2010-04-28 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-749:
-

Hadoop Flags: [Reviewed]

+1, patch looks good to me. Tests failing was a quirk of Hudson, as this patch 
doesn't test code. ant bin-jar works correctly. 

 OSGi metadata not included in binary only jar
 -

 Key: ZOOKEEPER-749
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-749
 Project: Zookeeper
  Issue Type: Bug
  Components: build
Affects Versions: 3.3.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.3.1, 3.4.0

 Attachments: ZOOKEEPER-749.patch


 See this JIRA/comment for background:
 https://issues.apache.org/jira/browse/ZOOKEEPER-425?focusedCommentId=12859697page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12859697
 basically the issue is that OSGi metadata is included in the legacy jar 
 (zookeeper-version.jar) but not in the binary only
 jar (zookeeper-version-bin.jar) which is eventually deployed to the maven 
 repo.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-749) OSGi metadata not included in binary only jar

2010-04-28 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-749:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed this. Thanks Patrick!

 OSGi metadata not included in binary only jar
 -

 Key: ZOOKEEPER-749
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-749
 Project: Zookeeper
  Issue Type: Bug
  Components: build
Affects Versions: 3.3.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.3.1, 3.4.0

 Attachments: ZOOKEEPER-749.patch


 See this JIRA/comment for background:
 https://issues.apache.org/jira/browse/ZOOKEEPER-425?focusedCommentId=12859697page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12859697
 basically the issue is that OSGi metadata is included in the legacy jar 
 (zookeeper-version.jar) but not in the binary only
 jar (zookeeper-version-bin.jar) which is eventually deployed to the maven 
 repo.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (ZOOKEEPER-750) move maven artifacts into dist-maven subdir of the release (package target)

2010-04-28 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson resolved ZOOKEEPER-750.
--

Resolution: Fixed

I just committed ZOOKEEPER-749 (which addresses this as well). Thanks Patrick!

 move maven artifacts into dist-maven subdir of the release (package target)
 -

 Key: ZOOKEEPER-750
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-750
 Project: Zookeeper
  Issue Type: Bug
  Components: build
Affects Versions: 3.3.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Fix For: 3.3.1, 3.4.0


 The maven artifacts are currently (3.3.0) put into the toplevel of the 
 release. This causes confusion
 amonst new users (ie which jar do I use?). Also the naming of the bin jar 
 is wrong for maven (to put
 onto the maven repo it must be named without the -bin) which adds extra 
 burden for the release
 manager. Putting into a subdir fixes this and makes it explicit what's being 
 deployed to maven repo.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



ZooKeeper gets three Google Summer of Code students

2010-04-26 Thread Henry Robinson
Hi -

Just wanted to announce to the community that we are lucky to have three
talented students working on Google's Summer of Code projects directly
related to ZooKeeper.

Andrei Savu  will be working with Patrick Hunt on a Web-based Administrative
Interface, extending and improving Patrick's Django-based front end.
Abmar Barros will be working with Flavio Junqueira on improving ZooKeeper's
failure detector module - making the code cleaner and easier to try out new
implementations, as well as implementing a few failure detection algorithms
himself!
Finally, Sergey Doroshenko will be working with me on a Read-Only Mode for
ZooKeeper, which will help bolster ZK's availability in certain
circumstances when a network partition is detected, as well as potentially
optimising the read-path.

(The full list of 450 GSoC students is here:
http://socghop.appspot.com/gsoc/program/list_projects/google/gsoc2010)

Congratulations to all three - we look forward to seeing what you produce
over the summer. Thanks to everyone who applied, suggested projects and
offered to mentor students; this program will have a big effect on
ZooKeeper's visibility and community, as well as hopefully producing some
great code!

cheers,
Henry

-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


[jira] Reopened: (ZOOKEEPER-740) zkpython leading to segfault on zookeeper

2010-04-23 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson reopened ZOOKEEPER-740:
--


Ok, thanks for the update. Can you share the code that you are running to give 
the segfault? That will make it much easier for me to diagnose.

 zkpython leading to segfault on zookeeper
 -

 Key: ZOOKEEPER-740
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-740
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
Reporter: Federico
Assignee: Henry Robinson
Priority: Critical
 Fix For: 3.3.1, 3.4.0


 The program that we are implementing uses the python binding for zookeeper 
 but sometimes it crash with segfault; here is the bt from gdb:
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0xad244b70 (LWP 28216)]
 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
 at ../Objects/abstract.c:2488
 2488../Objects/abstract.c: No such file or directory.
 in ../Objects/abstract.c
 (gdb) bt
 #0  0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
 at ../Objects/abstract.c:2488
 #1  0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x862fab0,
 arg=0x8837194, kw=0x0) at ../Python/ceval.c:3575
 #2  0x080612a0 in PyObject_CallObject (o=0x862fab0, a=0x8837194)
 at ../Objects/abstract.c:2480
 #3  0x0047af42 in watcher_dispatch (zzh=0x86174e0, type=-1, state=1,
 path=0x86337c8 , context=0x8588660) at src/c/zookeeper.c:314
 #4  0x00496559 in do_foreach_watcher (zh=0x86174e0, type=-1, state=1,
 path=0x86337c8 , list=0xa5354140) at src/zk_hashtable.c:275
 #5  deliverWatchers (zh=0x86174e0, type=-1, state=1, path=0x86337c8 ,
 list=0xa5354140) at src/zk_hashtable.c:317
 #6  0x0048ae3c in process_completions (zh=0x86174e0) at src/zookeeper.c:1766
 #7  0x0049706b in do_completion (v=0x86174e0) at src/mt_adaptor.c:333
 #8  0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
 #9  0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-746) learner outputs session id to log in dec (should be hex)

2010-04-21 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-746:
-

Hadoop Flags: [Reviewed]

+1, patch looks good to me. No tests required, pre-empting Hudsonbot. 

 learner outputs session id to log in dec (should be hex)
 

 Key: ZOOKEEPER-746
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-746
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Minor
 Fix For: 3.3.1, 3.4.0

 Attachments: ZOOKEEPER-746.patch


 usability issue, should be in hex:
 2010-04-21 11:31:13,827 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11354:lear...@95] - Revalidating 
 client: 83353578391797760

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-19 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858665#action_12858665
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

Alan - that would be great. If you can take a jstack dump of the process when 
it hangs we can do some forensics.

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.3.1


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-631) zkpython's C code could do with a style clean-up

2010-04-18 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-631:
-

Status: Patch Available  (was: Open)

 zkpython's C code could do with a style clean-up
 

 Key: ZOOKEEPER-631
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-631
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bindings
Reporter: Henry Robinson
Assignee: Henry Robinson
Priority: Minor
 Attachments: ZOOKEEPER-631.patch


 Inconsistent formatting / use of parenthesis / some error checking - all need 
 fixing. 
 Also, the documentation in the header file could do with a reformat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-631) zkpython's C code could do with a style clean-up

2010-04-18 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858377#action_12858377
 ] 

Henry Robinson commented on ZOOKEEPER-631:
--

The existing tests are the ones that validate this patch. To test the Py_None 
and memory allocation issues is hard because in the first case the GC behaviour 
is hard to force and in the second we would have to stub out calloc(..) somehow!

 zkpython's C code could do with a style clean-up
 

 Key: ZOOKEEPER-631
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-631
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bindings
Reporter: Henry Robinson
Assignee: Henry Robinson
Priority: Minor
 Attachments: ZOOKEEPER-631.patch


 Inconsistent formatting / use of parenthesis / some error checking - all need 
 fixing. 
 Also, the documentation in the header file could do with a reformat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-742) Deallocatng None on writes

2010-04-16 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858064#action_12858064
 ] 

Henry Robinson commented on ZOOKEEPER-742:
--

Patch to ZOOKEEPER-631 should fix this issue - when that is committed, we can 
close out this ticket. 

 Deallocatng None on writes
 --

 Key: ZOOKEEPER-742
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-742
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, contrib, contrib-bindings
Affects Versions: 3.2.2, 3.3.0
 Environment: Redhat Enterprise 5.4 (python 2.4.3), Mac OS X 10.5.8 
 (python 2.5.1)
Reporter: Josh Fraser
Assignee: Henry Robinson
 Attachments: commands.py, foo.p, ZOOKEEPER-742.patch, 
 ZOOKEEPER-742.patch


 On write operations, getting:
 Fatal Python error: deallocating None
 Aborted
 This error happens on write operations only.  Here's the backtrace:
 Fatal Python error: deallocating None
 Program received signal SIGABRT, Aborted.
 0x00383fc30215 in raise () from /lib64/libc.so.6
 (gdb) bt
 #0  0x00383fc30215 in raise () from /lib64/libc.so.6
 #1  0x00383fc31cc0 in abort () from /lib64/libc.so.6
 #2  0x2adbd0be8189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0
 #3  0x2adbd0bc7493 in PyEval_EvalFrame () from 
 /usr/lib64/libpython2.4.so.1.0
 #4  0x2adbd0bcab66 in PyEval_EvalFrame () from 
 /usr/lib64/libpython2.4.so.1.0
 #5  0x2adbd0bcbfe5 in PyEval_EvalCodeEx () from 
 /usr/lib64/libpython2.4.so.1.0
 #6  0x2adbd0bcc032 in PyEval_EvalCode () from 
 /usr/lib64/libpython2.4.so.1.0
 #7  0x2adbd0be8729 in ?? () from /usr/lib64/libpython2.4.so.1.0
 #8  0x2adbd0be9bd8 in PyRun_SimpleFileExFlags () from 
 /usr/lib64/libpython2.4.so.1.0
 #9  0x2adbd0bf000d in Py_Main () from /usr/lib64/libpython2.4.so.1.0
 #10 0x00383fc1d974 in __libc_start_main () from /lib64/libc.so.6
 #11 0x00400629 in _start ()

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (ZOOKEEPER-729) Recursively delete a znode - zkCli.sh rmr /node

2010-04-15 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-729:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed this (had to move the test file into java/, but otherwise 
committed as submitted) - thanks Kay!

 Recursively delete a znode  - zkCli.sh rmr /node
 

 Key: ZOOKEEPER-729
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-729
 Project: Zookeeper
  Issue Type: New Feature
  Components: java client
Reporter: Kay Kay
Assignee: Kay Kay
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-729.patch, ZOOKEEPER-729.patch, 
 ZOOKEEPER-729.patch, ZOOKEEPER-729.patch, ZOOKEEPER-729.patch


 Recursively delete a given znode in zookeeper, from the command-line. 
 New operation rmr added to zkclient. 
 $ ./zkCli.sh rmr /node 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (ZOOKEEPER-742) Deallocatng None on writes

2010-04-15 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857628#action_12857628
 ] 

Henry Robinson commented on ZOOKEEPER-742:
--

Thanks Josh - can you share the portion of your script that is causing the 
problem?



 Deallocatng None on writes
 --

 Key: ZOOKEEPER-742
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-742
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, contrib, contrib-bindings
Affects Versions: 3.2.2, 3.3.0
 Environment: Redhat Enterprise 5.4 (python 2.4.3), Mac OS X 10.5.8 
 (python 2.5.1)
Reporter: Josh Fraser

 On write operations, getting:
 Fatal Python error: deallocating None
 Aborted
 This error happens on write operations only.  Here's the backtrace:
 Fatal Python error: deallocating None
 Program received signal SIGABRT, Aborted.
 0x00383fc30215 in raise () from /lib64/libc.so.6
 (gdb) bt
 #0  0x00383fc30215 in raise () from /lib64/libc.so.6
 #1  0x00383fc31cc0 in abort () from /lib64/libc.so.6
 #2  0x2adbd0be8189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0
 #3  0x2adbd0bc7493 in PyEval_EvalFrame () from 
 /usr/lib64/libpython2.4.so.1.0
 #4  0x2adbd0bcab66 in PyEval_EvalFrame () from 
 /usr/lib64/libpython2.4.so.1.0
 #5  0x2adbd0bcbfe5 in PyEval_EvalCodeEx () from 
 /usr/lib64/libpython2.4.so.1.0
 #6  0x2adbd0bcc032 in PyEval_EvalCode () from 
 /usr/lib64/libpython2.4.so.1.0
 #7  0x2adbd0be8729 in ?? () from /usr/lib64/libpython2.4.so.1.0
 #8  0x2adbd0be9bd8 in PyRun_SimpleFileExFlags () from 
 /usr/lib64/libpython2.4.so.1.0
 #9  0x2adbd0bf000d in Py_Main () from /usr/lib64/libpython2.4.so.1.0
 #10 0x00383fc1d974 in __libc_start_main () from /lib64/libc.so.6
 #11 0x00400629 in _start ()

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (ZOOKEEPER-742) Deallocatng None on writes

2010-04-15 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857638#action_12857638
 ] 

Henry Robinson commented on ZOOKEEPER-742:
--

Thanks very much for this - any chance you can share Commands as well, so that 
I can see the actual zookeeper API calls that are being made? Let me know if 
you're not comfortable posting it publicly. 

 Deallocatng None on writes
 --

 Key: ZOOKEEPER-742
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-742
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, contrib, contrib-bindings
Affects Versions: 3.2.2, 3.3.0
 Environment: Redhat Enterprise 5.4 (python 2.4.3), Mac OS X 10.5.8 
 (python 2.5.1)
Reporter: Josh Fraser
 Attachments: foo.p


 On write operations, getting:
 Fatal Python error: deallocating None
 Aborted
 This error happens on write operations only.  Here's the backtrace:
 Fatal Python error: deallocating None
 Program received signal SIGABRT, Aborted.
 0x00383fc30215 in raise () from /lib64/libc.so.6
 (gdb) bt
 #0  0x00383fc30215 in raise () from /lib64/libc.so.6
 #1  0x00383fc31cc0 in abort () from /lib64/libc.so.6
 #2  0x2adbd0be8189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0
 #3  0x2adbd0bc7493 in PyEval_EvalFrame () from 
 /usr/lib64/libpython2.4.so.1.0
 #4  0x2adbd0bcab66 in PyEval_EvalFrame () from 
 /usr/lib64/libpython2.4.so.1.0
 #5  0x2adbd0bcbfe5 in PyEval_EvalCodeEx () from 
 /usr/lib64/libpython2.4.so.1.0
 #6  0x2adbd0bcc032 in PyEval_EvalCode () from 
 /usr/lib64/libpython2.4.so.1.0
 #7  0x2adbd0be8729 in ?? () from /usr/lib64/libpython2.4.so.1.0
 #8  0x2adbd0be9bd8 in PyRun_SimpleFileExFlags () from 
 /usr/lib64/libpython2.4.so.1.0
 #9  0x2adbd0bf000d in Py_Main () from /usr/lib64/libpython2.4.so.1.0
 #10 0x00383fc1d974 in __libc_start_main () from /lib64/libc.so.6
 #11 0x00400629 in _start ()

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   4   >