from:"Benjamin Reed"


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-930:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed revision 1035727.


 Hedwig c++ client uses a non thread safe logging library
 

 Key: ZOOKEEPER-930
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-930
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-hedwig
Affects Versions: 3.3.2
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Attachments: ZOOKEEPER-930.patch, ZOOKEEPER-930.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-930) Hedwig c++ client uses a non thread safe logging library


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932583#action_12932583
 ] 

Benjamin Reed commented on ZOOKEEPER-930:
-

thanx ivan!

 Hedwig c++ client uses a non thread safe logging library
 

 Key: ZOOKEEPER-930
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-930
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-hedwig
Affects Versions: 3.3.2
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Attachments: ZOOKEEPER-930.patch, ZOOKEEPER-930.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932592#action_12932592
 ] 

Benjamin Reed commented on ZOOKEEPER-925:
-

+1 for confluence

it would be great to target 1) for when we move to tlp.

 Consider maven site generation to replace our forrest site and documentation 
 generation
 ---

 Key: ZOOKEEPER-925
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925
 Project: Zookeeper
  Issue Type: Wish
  Components: documentation
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Attachments: ZOOKEEPER-925.patch


 See WHIRR-19 for some background.
 In whirr we looked at a number of site/doc generation facilities. In the end 
 Maven site generation plugin turned out to be by far the best option. You can 
 see our nascent site here (no attempt at styling,etc so far):
 http://incubator.apache.org/whirr/
 In particular take a look at the quick start:
 http://incubator.apache.org/whirr/quick-start-guide.html
 which was generated from
 http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence
 notice this was standard wiki markup (confluence wiki markup, same as 
 available from apache)
 You can read more about mvn site plugin here:
 http://maven.apache.org/guides/mini/guide-site.html
 Notice that other formats are available, not just confluence markup, also 
 note that you can use different markup formats if you like in the same site 
 (although probably not a great idea, but in some cases might be handy, for 
 example whirr uses the confluence wiki, so we can pretty much copy/paste 
 source docs from wiki to our site (svn) if we like)
 Re maven vs our current ant based build. It's probably a good idea for us to 
 move the build to maven at some point. We could initially move just the doc 
 generation, and then incrementally move functionality from build.xml to mvn 
 over a longer time period.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

[
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932639#action_12932639
]

Benjamin Reed commented on ZOOKEEPER-922:
-

if we had a foolproof way to tell that a client is down, we could do this fast
expire. the methods you are proposing are not foolproof and will lead to
problems exactly when you most want them not to.

the timeout interactions you are talking about are problematic. it's really
hard to get them right.

one way that i can see this working is to not allow clients to reconnect to
other servers. in that can a socket reset would indicate an expired session. is
this acceptable to you?

enable faster timeout of sessions in case of unexpected socket disconnect
-

Attachments: ZOOKEEPER-922.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation

[
https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932799#action_12932799
]

Benjamin Reed commented on ZOOKEEPER-925:
-

i cannot figure out how to convert forrest to anything. actually, i can't
figure out how we have forrest working at all! after burning the afternoon
trying to figure out how to convert forrest to confluence, i'm officially
declaring defeat. it should be an easy thing to do for an xml/xsl master, but
that is not me.

the most promising thing appears to be the doxia converter that will go from a
bunch of formats to a bunch more formats, including from docbook or xdoc to
confluence. unfortunately, forrest seems close to both of those, but not close
enough...

Consider maven site generation to replace our forrest site and documentation
generation
---

Key: ZOOKEEPER-925
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925
Project: Zookeeper
Issue Type: Wish
Components: documentation
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Attachments: ZOOKEEPER-925.patch

See WHIRR-19 for some background.
In whirr we looked at a number of site/doc generation facilities. In the end
Maven site generation plugin turned out to be by far the best option. You can
see our nascent site here (no attempt at styling,etc so far):
http://incubator.apache.org/whirr/
In particular take a look at the quick start:
http://incubator.apache.org/whirr/quick-start-guide.html
which was generated from
http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence
notice this was standard wiki markup (confluence wiki markup, same as
available from apache)
You can read more about mvn site plugin here:
http://maven.apache.org/guides/mini/guide-site.html
Notice that other formats are available, not just confluence markup, also
note that you can use different markup formats if you like in the same site
(although probably not a great idea, but in some cases might be handy, for
example whirr uses the confluence wiki, so we can pretty much copy/paste
source docs from wiki to our site (svn) if we like)
Re maven vs our current ant based build. It's probably a good idea for us to
move the build to maven at some point. We could initially move just the doc
generation, and then incrementally move functionality from build.xml to mvn
over a longer time period.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes

[
https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932809#action_12932809
]

Benjamin Reed commented on ZOOKEEPER-366:
-

i haven't had a chance to get back to this. we really need to convert all the
currentTimeMillis() to nanoTime(). we need to do a similar change in the C
client.

i don't think we can do a test for this.

Session timeout detection can go wrong if the leader system time changes

Key: ZOOKEEPER-366
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366
Project: Zookeeper
Issue Type: Bug
Components: quorum, server
Reporter: Benjamin Reed
Assignee: Benjamin Reed
Fix For: 3.3.3, 3.4.0

Attachments: ZOOKEEPER-366.patch

the leader tracks session expirations by calculating when a session will
timeout and then periodically checking to see what needs to be timed out
based on the current time. this works great as long as the leaders clock
progresses at a steady pace. the problem comes when there are big (session
size) changes in clock, by ntp for example. if time gets adjusted forward,
all the sessions could timeout immediately. if time goes backward sessions
that should timeout may take a lot longer to actually expire.
this is really just a leader issue. the easiest way to deal with this is to
have the leader relinquish leadership if it detects a big jump forward in
time. when a new leader gets elected, it will recalculate timeouts of active
sessions.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-930) Hedwig c++ client uses a non thread safe logging library

2010-11-15 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932163#action_12932163
 ] 

Benjamin Reed commented on ZOOKEEPER-930:
-

looks good ivan. you should probably mention that you are moving to log4cxx for 
thread safety issues. the one minor thing: you messed up the indentation on a 
couple of lines. can you fix those?

 Hedwig c++ client uses a non thread safe logging library
 

 Key: ZOOKEEPER-930
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-930
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-hedwig
Affects Versions: 3.3.2
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Attachments: ZOOKEEPER-930.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-10 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-909:


Hadoop Flags: [Reviewed]

+1 looks good thomas! thanx!

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-10 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-922:


Status: Open  (was: Patch Available)

the problem with your corner case is that you can end up with a leader who 
thinks it is still the leader, but zookeeper thinks the leader is dead and 
allows another leader to take over.

there may be a way to do this reliably, but we need to vet the design first.

 enable faster timeout of sessions in case of unexpected socket disconnect
 -

 Key: ZOOKEEPER-922
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Camille Fournier
Assignee: Camille Fournier
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-922.patch


 In the case when a client connection is closed due to socket error instead of 
 the client calling close explicitly, it would be nice to enable the session 
 associated with that client to time out faster than the negotiated session 
 timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
 discovery provider to remove ephemeral nodes for crashed clients quickly, 
 while allowing for a longer heartbeat-based timeout for java clients that 
 need to do long stop-the-world GC. 
 I propose doing this by setting the timeout associated with the crashed 
 session to minSessionTimeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: What happens to a follower if leader hangs?

2010-11-10 Thread Benjamin Reed

have you been able to make this happen? the behavior you are suggesting 
is exactly what should be happening. When we sync with the leader we set 
the socket timeout: sock.setSoTimeout(self.tickTime * self.syncLimit);


if the leader hangs, we should get a timeout and disconnect from the leader.

ben


On 11/10/2010 11:57 AM, Vishal Kher wrote:

Yes, thats what I was planning to do. At the follower, start FLE if the
follower does not receive a ping for  (syncLimit * tickTime).


On Wed, Nov 10, 2010 at 2:48 PM, Mahadev Konarmaha...@yahoo-inc.comwrote:


Hi Vishal,
  There are periodic pings sent from the leader to the followers.

Take a look at Leader.java:

syncedSet.add(self.getId());
synchronized (learners) {
for (LearnerHandler f : learners) {
if (f.synced()) {
syncedCount++;
syncedSet.add(f.getSid());
}
f.ping();
}
}


This code sends periodic pings to the followers to make sure they are
running fine. We should keep track of these pings and see if we havent seen
a ping packet from the leader for a long time and give up following the
leader in case we havent heard from him for a long time. This is definitely
worth fixing since we pride ourselves in being a highly available and
reliable service.

Please feel free to open a jira and work on it.
3.4 would be a good target for this.

Thanks
mahadev

On 11/10/10 12:26 PM, Vishal Khervishalm...@gmail.com  wrote:


Hi,

In Follower.followLeader() after syncing with the leader, the follower

does:

 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }

It looks like it relies on socket timeout expiry to figure out if the
connection with the leader has gone down.  So a follower *with no

cilents*

may never notice a faulty leader if a Leader has a software hang, but the
TCP connections with the peers are still valid. Since it has not cilents,

it

won't hearbeat with the Leader. If majority of followers are not

connected

to any clients, then even if other followers attempt to elect a new

leader

after detecting that the leader is unresponsive.

Please correct me if I am wrong. If I am not mistaken, should we add code

at

the follower to monitor the heartbeat messages that it receives from the
leader and take action if it misses heartbeats for time  (syncLimit *
tickTime)? This certainly is a hypothetical case, however, I think it is
worth a fix.

Thanks.
-Vishal

[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930205#action_12930205
 ] 

Benjamin Reed commented on ZOOKEEPER-925:
-

i'm totally interested in moving to maven site! i really really want to get 
away from forrest and make it a bit easier to write doc. can we also get away 
from checking in generated doc?

 Consider maven site generation to replace our forrest site and documentation 
 generation
 ---

 Key: ZOOKEEPER-925
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925
 Project: Zookeeper
  Issue Type: Wish
  Components: documentation
Reporter: Patrick Hunt

 See WHIRR-19 for some background.
 In whirr we looked at a number of site/doc generation facilities. In the end 
 Maven site generation plugin turned out to be by far the best option. You can 
 see our nascent site here (no attempt at styling,etc so far):
 http://incubator.apache.org/whirr/
 In particular take a look at the quick start:
 http://incubator.apache.org/whirr/quick-start-guide.html
 which was generated from
 http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence
 notice this was standard wiki markup (confluence wiki markup, same as 
 available from apache)
 You can read more about mvn site plugin here:
 http://maven.apache.org/guides/mini/guide-site.html
 Notice that other formats are available, not just confluence markup, also 
 note that you can use different markup formats if you like in the same site 
 (although probably not a great idea, but in some cases might be handy, for 
 example whirr uses the confluence wiki, so we can pretty much copy/paste 
 source docs from wiki to our site (svn) if we like)
 Re maven vs our current ant based build. It's probably a good idea for us to 
 move the build to maven at some point. We could initially move just the doc 
 generation, and then incrementally move functionality from build.xml to mvn 
 over a longer time period.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation

[
https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930221#action_12930221
]

Benjamin Reed commented on ZOOKEEPER-925:
-

just to be clear. we should check in the source for the docs. i'm just saying
that we check only check in the source for the docs, not the generated pdfs and
web pages.

Consider maven site generation to replace our forrest site and documentation
generation
---

Key: ZOOKEEPER-925
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925
Project: Zookeeper
Issue Type: Wish
Components: documentation
Reporter: Patrick Hunt

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930517#action_12930517
 ] 

Benjamin Reed commented on ZOOKEEPER-925:
-

this is pretty cool! we can generate pdfs by using doxia converter to go from 
confluence to latex.

 Consider maven site generation to replace our forrest site and documentation 
 generation
 ---

 Key: ZOOKEEPER-925
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925
 Project: Zookeeper
  Issue Type: Wish
  Components: documentation
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Attachments: ZOOKEEPER-925.patch


 See WHIRR-19 for some background.
 In whirr we looked at a number of site/doc generation facilities. In the end 
 Maven site generation plugin turned out to be by far the best option. You can 
 see our nascent site here (no attempt at styling,etc so far):
 http://incubator.apache.org/whirr/
 In particular take a look at the quick start:
 http://incubator.apache.org/whirr/quick-start-guide.html
 which was generated from
 http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence
 notice this was standard wiki markup (confluence wiki markup, same as 
 available from apache)
 You can read more about mvn site plugin here:
 http://maven.apache.org/guides/mini/guide-site.html
 Notice that other formats are available, not just confluence markup, also 
 note that you can use different markup formats if you like in the same site 
 (although probably not a great idea, but in some cases might be handy, for 
 example whirr uses the confluence wiki, so we can pretty much copy/paste 
 source docs from wiki to our site (svn) if we like)
 Re maven vs our current ant based build. It's probably a good idea for us to 
 move the build to maven at some point. We could initially move just the doc 
 generation, and then incrementally move functionality from build.xml to mvn 
 over a longer time period.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930526#action_12930526
 ] 

Benjamin Reed commented on ZOOKEEPER-925:
-

since maven generates the doc without requiring preinstalled tools. i don't 
think it is onerous at all to just check in the sources and require users to 
compile the doc if they are using trunk.

 Consider maven site generation to replace our forrest site and documentation 
 generation
 ---

 Key: ZOOKEEPER-925
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925
 Project: Zookeeper
  Issue Type: Wish
  Components: documentation
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Attachments: ZOOKEEPER-925.patch


 See WHIRR-19 for some background.
 In whirr we looked at a number of site/doc generation facilities. In the end 
 Maven site generation plugin turned out to be by far the best option. You can 
 see our nascent site here (no attempt at styling,etc so far):
 http://incubator.apache.org/whirr/
 In particular take a look at the quick start:
 http://incubator.apache.org/whirr/quick-start-guide.html
 which was generated from
 http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence
 notice this was standard wiki markup (confluence wiki markup, same as 
 available from apache)
 You can read more about mvn site plugin here:
 http://maven.apache.org/guides/mini/guide-site.html
 Notice that other formats are available, not just confluence markup, also 
 note that you can use different markup formats if you like in the same site 
 (although probably not a great idea, but in some cases might be handy, for 
 example whirr uses the confluence wiki, so we can pretty much copy/paste 
 source docs from wiki to our site (svn) if we like)
 Re maven vs our current ant based build. It's probably a good idea for us to 
 move the build to maven at some point. We could initially move just the doc 
 generation, and then incrementally move functionality from build.xml to mvn 
 over a longer time period.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-08 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929683#action_12929683
 ] 

Benjamin Reed commented on ZOOKEEPER-922:
-

how do you deal with the following race condition:

1) the client is connected to follower1
2) the client has problems talking to follower1, so it closes the connection
3) the client connects to follower2
4) follower1 detects the closed connection and sets the connection timeout to 
min
5) the client is idle for min timeout and the leader expires the connection

the race condition is steps 3) and 4). if follower1 doesn't detect the dead 
connection fast enough, it can improperly set the timeout.

 enable faster timeout of sessions in case of unexpected socket disconnect
 -

 Key: ZOOKEEPER-922
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Camille Fournier
Assignee: Camille Fournier
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-922.patch


 In the case when a client connection is closed due to socket error instead of 
 the client calling close explicitly, it would be nice to enable the session 
 associated with that client to time out faster than the negotiated session 
 timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
 discovery provider to remove ephemeral nodes for crashed clients quickly, 
 while allowing for a longer heartbeat-based timeout for java clients that 
 need to do long stop-the-world GC. 
 I propose doing this by setting the timeout associated with the crashed 
 session to minSessionTimeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-862) Hedwig created ledgers with hardcoded Bookkeeper ensemble and quorum size. Make these a server config parameter instead.


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-862:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

+1 looks good thanx Erwin

it looks like this was accidentally committed in r1031051

 Hedwig created ledgers with hardcoded Bookkeeper ensemble and quorum size.  
 Make these a server config parameter instead.
 -

 Key: ZOOKEEPER-862
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-862
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-hedwig
Reporter: Erwin Tam
Assignee: Erwin Tam
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-862.patch


 Hedwig code right now when using Bookkeeper as the persistence store is 
 hardcoding the number of bookie servers in the ensemble and quorum size.  
 This is used the first time a ledger is created.  This should be exposed as a 
 server configuration parameter instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-916) Problem receiving messages from subscribed channels in c++ client


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-916:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed revision 1031453.


 Problem receiving messages from subscribed channels in c++ client 
 --

 Key: ZOOKEEPER-916
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-916
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-hedwig
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Attachments: ZOOKEEPER-916.patch


 We see this bug with receiving messages from a subscribed channel.  This 
 problem seems to happen with larger messages.  The flow is to first read at 
 least 4 bytes from the socket channel. Extract the first 4 bytes to get the 
 message size.  If we've read enough data into the buffer already, we're done 
 so invoke the messageReadCallbackHandler passing the channel and message 
 size.  If not, then do an async read for at least the remaining amount of 
 bytes in the message from the socket channel.  When done, invoke the 
 messageReadCallbackHandler.
 The problem seems that when the second async read is done, the same 
 sizeReadCallbackHandler is invoked instead of the messageReadCallbackHandler. 
  The result is that we then try to read the first 4 bytes again from the 
 buffer.  This will get a random message size and screw things up.  I'm not 
 sure if it's an incorrect use of the boost asio async_read function or we're 
 doing the boost bind to the callback function incorrectly.
 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
 DuplexChannel::sizeReadCallbackHandler system:0,512 channel(0x80b7a18)
 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
 DuplexChannel::sizeReadCallbackHandler: size of buffer before reading message 
 size: 512 channel(0x80b7a18)
 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
 DuplexChannel::sizeReadCallbackHandler: size of incoming message 599, 
 currently in buffer 508 channel(0x80b7a18)
 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
 DuplexChannel::sizeReadCallbackHandler: Still have more data to read, 91 from 
 channel(0x80b7a18)
 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
 DuplexChannel::sizeReadCallbackHandler system:0, 91 channel(0x80b7a18)
 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
 DuplexChannel::sizeReadCallbackHandler: size of buffer before reading message 
 size: 599 channel(0x80b7a18)
 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
 DuplexChannel::sizeReadCallbackHandler: size of incoming message 134287360, 
 currently in buffer 595 channel(0x80b7a18)
 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
 DuplexChannel::sizeReadCallbackHandler: Still have more data to read, 
 134286765 from channel(0x80b7a18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-916) Problem receiving messages from subscribed channels in c++ client


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-916:


Hadoop Flags: [Reviewed]

+1 thanx for the fix ivan!

 Problem receiving messages from subscribed channels in c++ client 
 --

 Key: ZOOKEEPER-916
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-916
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-hedwig
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Attachments: ZOOKEEPER-916.patch


 We see this bug with receiving messages from a subscribed channel.  This 
 problem seems to happen with larger messages.  The flow is to first read at 
 least 4 bytes from the socket channel. Extract the first 4 bytes to get the 
 message size.  If we've read enough data into the buffer already, we're done 
 so invoke the messageReadCallbackHandler passing the channel and message 
 size.  If not, then do an async read for at least the remaining amount of 
 bytes in the message from the socket channel.  When done, invoke the 
 messageReadCallbackHandler.
 The problem seems that when the second async read is done, the same 
 sizeReadCallbackHandler is invoked instead of the messageReadCallbackHandler. 
  The result is that we then try to read the first 4 bytes again from the 
 buffer.  This will get a random message size and screw things up.  I'm not 
 sure if it's an incorrect use of the boost asio async_read function or we're 
 doing the boost bind to the callback function incorrectly.
 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
 DuplexChannel::sizeReadCallbackHandler system:0,512 channel(0x80b7a18)
 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
 DuplexChannel::sizeReadCallbackHandler: size of buffer before reading message 
 size: 512 channel(0x80b7a18)
 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
 DuplexChannel::sizeReadCallbackHandler: size of incoming message 599, 
 currently in buffer 508 channel(0x80b7a18)
 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
 DuplexChannel::sizeReadCallbackHandler: Still have more data to read, 91 from 
 channel(0x80b7a18)
 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
 DuplexChannel::sizeReadCallbackHandler system:0, 91 channel(0x80b7a18)
 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
 DuplexChannel::sizeReadCallbackHandler: size of buffer before reading message 
 size: 599 channel(0x80b7a18)
 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
 DuplexChannel::sizeReadCallbackHandler: size of incoming message 134287360, 
 currently in buffer 595 channel(0x80b7a18)
 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
 DuplexChannel::sizeReadCallbackHandler: Still have more data to read, 
 134286765 from channel(0x80b7a18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-909:


Status: Open  (was: Patch Available)

once a couple of small changes are made to this patch, we should be good to go.

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages

2010-11-04 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-907.
-

Resolution: Fixed

Committed revision 1031051.
Committed revision 1031064.


 Spurious KeeperErrorCode = Session moved messages
 ---

 Key: ZOOKEEPER-907
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.1
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-907.patch, ZOOKEEPER-907.patch_v2


 The sync request does not set the session owner in Request.
 As a result, the leader keeps printing:
 2010-07-01 10:55:36,733 - INFO  [ProcessThread:-1:preprequestproces...@405] - 
 Got user-level KeeperException when processing sessionid:0x298d3b1fa9 
 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error 
 Path:null Error:KeeperErrorCode = Session moved

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-884) Remove LedgerSequence references from BookKeeper documentation and comments in tests

2010-11-04 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-884:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

+1 thanx flavio

Committed revision 1031433.


 Remove LedgerSequence references from BookKeeper documentation and comments 
 in tests 
 -

 Key: ZOOKEEPER-884
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-884
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Affects Versions: 3.3.1
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-884.patch


 We no longer use LedgerSequence, so we need to remove references in 
 documentation and comments sprinkled throughout the code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages

2010-11-03 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-907:


Hadoop Flags: [Reviewed]

 Spurious KeeperErrorCode = Session moved messages
 ---

 Key: ZOOKEEPER-907
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.1
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-907.patch, ZOOKEEPER-907.patch_v2


 The sync request does not set the session owner in Request.
 As a result, the leader keeps printing:
 2010-07-01 10:55:36,733 - INFO  [ProcessThread:-1:preprequestproces...@405] - 
 Got user-level KeeperException when processing sessionid:0x298d3b1fa9 
 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error 
 Path:null Error:KeeperErrorCode = Session moved

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-11-01 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927045#action_12927045
 ] 

Benjamin Reed commented on ZOOKEEPER-909:
-

the patch looks good. are you proposing that we commit it? or are you still 
working on it? i don't mind pushing off the javadoc for a bit if you think 
things might change. (although it would be nice to get that class more firmed 
up before we commit really...) we should get the property doc in before we 
commit since that will not change.

One other nit, if you are willing: calling the ClientCxnSocket socket and 
using getSocket is a bit confusing since ClientCnxnSocket does not extend 
socket. It's a bit more verbose, but more clear if you call the member and 
method clientCxnSocket and getClientCnxnSocket.

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages

2010-10-29 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926404#action_12926404
 ] 

Benjamin Reed commented on ZOOKEEPER-907:
-

may i propose accepting this patch without a test case? (we can see that it 
fixes the problem.) that way we can get 3.3.2 out. once ZOOKEEPER-915 goes it 
the tests should cover this issue.

 Spurious KeeperErrorCode = Session moved messages
 ---

 Key: ZOOKEEPER-907
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.1
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-907.patch, ZOOKEEPER-907.patch_v2


 The sync request does not set the session owner in Request.
 As a result, the leader keeps printing:
 2010-07-01 10:55:36,733 - INFO  [ProcessThread:-1:preprequestproces...@405] - 
 Got user-level KeeperException when processing sessionid:0x298d3b1fa9 
 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error 
 Path:null Error:KeeperErrorCode = Session moved

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages

2010-10-28 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925976#action_12925976
 ] 

Benjamin Reed commented on ZOOKEEPER-907:
-

Ah, I see the problem. There are actually two problems: 1) when sync() get's an 
error it is not propagated back to the caller. 2) this problem.

They problem is that 1) is preventing us from writing a test case. We need to 
fix 1) and then we can write the test for 2).

 Spurious KeeperErrorCode = Session moved messages
 ---

 Key: ZOOKEEPER-907
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.1
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-907.patch, ZOOKEEPER-907.patch_v2


 The sync request does not set the session owner in Request.
 As a result, the leader keeps printing:
 2010-07-01 10:55:36,733 - INFO  [ProcessThread:-1:preprequestproces...@405] - 
 Got user-level KeeperException when processing sessionid:0x298d3b1fa9 
 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error 
 Path:null Error:KeeperErrorCode = Session moved

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (ZOOKEEPER-915) Errors that happen during sync() processing at the leader do not get propagated back to the client.

2010-10-28 Thread Benjamin Reed (JIRA)

Errors that happen during sync() processing at the leader do not get propagated 
back to the client.
---

 Key: ZOOKEEPER-915
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-915
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed


If an error in sync() processing happens at the leader (SESSION_MOVED for 
example), they are not propagated back to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages

2010-10-27 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925540#action_12925540
 ] 

Benjamin Reed commented on ZOOKEEPER-907:
-

ah got it. ok i was able to reproduce it: the client connects to the follower, 
issues a sync, the error message shows up in the log of the leader. so there is 
an additional bug here -- why is the client not getting the session moved error.

 Spurious KeeperErrorCode = Session moved messages
 ---

 Key: ZOOKEEPER-907
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.1
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-907.patch, ZOOKEEPER-907.patch_v2


 The sync request does not set the session owner in Request.
 As a result, the leader keeps printing:
 2010-07-01 10:55:36,733 - INFO  [ProcessThread:-1:preprequestproces...@405] - 
 Got user-level KeeperException when processing sessionid:0x298d3b1fa9 
 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error 
 Path:null Error:KeeperErrorCode = Session moved

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [VOTE] ZooKeeper as TLP?

2010-10-25 Thread Benjamin Reed


+1

On 10/22/2010 02:42 PM, Patrick Hunt wrote:

Please vote as to whether you think ZooKeeper should become a
top-level Apache project, as discussed previously on this list. I've
included below a draft board resolution.

Do folks support sending this request on to the Hadoop PMC?

Patrick



 X. Establish the Apache ZooKeeper Project

WHEREAS, the Board of Directors deems it to be in the best
interests of the Foundation and consistent with the
Foundation's purpose to establish a Project Management
Committee charged with the creation and maintenance of
open-source software related to distributed system coordination
for distribution at no charge to the public.

NOW, THEREFORE, BE IT RESOLVED, that a Project Management
Committee (PMC), to be known as the Apache ZooKeeper Project,
be and hereby is established pursuant to Bylaws of the
Foundation; and be it further

RESOLVED, that the Apache ZooKeeper Project be and hereby is
responsible for the creation and maintenance of software
related to distributed system coordination; and be it further

RESOLVED, that the office of Vice President, Apache ZooKeeper be
and hereby is created, the person holding such office to
serve at the direction of the Board of Directors as the chair
of the Apache ZooKeeper Project, and to have primary responsibility
for management of the projects within the scope of
responsibility of the Apache ZooKeeper Project; and be it further

RESOLVED, that the persons listed immediately below be and
hereby are appointed to serve as the initial members of the
Apache ZooKeeper Project:

  * Patrick Huntph...@apache.org
  * Flavio Junqueiraf...@apache.org
  * Mahadev Konarmaha...@apache.org
  * Benjamin Reedbr...@apache.org
  * Henry Robinsonhe...@apache.org

NOW, THEREFORE, BE IT FURTHER RESOLVED, that Patrick Hunt
be appointed to the office of Vice President, Apache ZooKeeper, to
serve in accordance with and subject to the direction of the
Board of Directors and the Bylaws of the Foundation until
death, resignation, retirement, removal or disqualification,
or until a successor is appointed; and be it further

RESOLVED, that the initial Apache ZooKeeper PMC be and hereby is
tasked with the creation of a set of bylaws intended to
encourage open development and increased participation in the
Apache ZooKeeper Project; and be it further

RESOLVED, that the Apache ZooKeeper Project be and hereby
is tasked with the migration and rationalization of the Apache
Hadoop ZooKeeper sub-project; and be it further

RESOLVED, that all responsibilities pertaining to the Apache
Hadoop ZooKeeper sub-project encumbered upon the
Apache Hadoop Project are hereafter discharged.

[jira] Commented: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages

2010-10-22 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923895#action_12923895
 ] 

Benjamin Reed commented on ZOOKEEPER-907:
-

sync doesn't cause any additional traffic over the atomic broadcast. it just 
makes sure that the all of the in-process transactions have be sent to the 
follower. when that error happens, the error will be sent back to the follower 
ordered after all of the completed transactions. so rather than being able to 
see the result of all requests initiated before the sync, the follower will see 
all requests completed before the sync. that is why i referred to it as a 
partial sync.

i'm really having problems trying to reproduce this error. can you describe 
more how it happened? i would like to have an end-to-end test rather than the 
test of a particular implementation so that this error doesn't pop up if the 
implementation changes. looking at the code it seems like it should happen 
everytime the sync request is sent to a follower, but that doesn't seem to be 
the case.

 Spurious KeeperErrorCode = Session moved messages
 ---

 Key: ZOOKEEPER-907
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.1
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-907.patch


 The sync request does not set the session owner in Request.
 As a result, the leader keeps printing:
 2010-07-01 10:55:36,733 - INFO  [ProcessThread:-1:preprequestproces...@405] - 
 Got user-level KeeperException when processing sessionid:0x298d3b1fa9 
 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error 
 Path:null Error:KeeperErrorCode = Session moved

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn

2010-10-22 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923905#action_12923905
 ] 

Benjamin Reed commented on ZOOKEEPER-909:
-

this is looking really nice. i'm not done reviewing, but i did want to note 
that you need to add the zookeeper.clientCxnSocket property to the doc. You 
should also javadoc that variable. 

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Patrick Hunt
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Restarting discussion on ZooKeeper as a TLP

2010-10-21 Thread Benjamin Reed

 i think we want to be responsible for the creation and maintenance of
software related to distributed system coordination.

ben

On 10/21/2010 01:43 PM, Mahadev Konar wrote:
NOW, THEREFORE, BE IT FURTHER RESOLVED, that Matt Massie
be appointed to the office of Vice President, Apache ZooKeeper, to

 I think you meant Patrick Hunt ?  :)

 Other than that it looks good.

 Thanks
 mahadev

 On 10/21/10 1:28 PM, Patrick Hunt ph...@apache.org wrote:

 Ack, I missed Henry in the list, sorry! In my defense I copied this:
 http://hadoop.apache.org/zookeeper/credits.html

 one more try (same as before except for adding henry to the pmc):
 

 X. Establish the Apache ZooKeeper Project

WHEREAS, the Board of Directors deems it to be in the best
interests of the Foundation and consistent with the
Foundation's purpose to establish a Project Management
Committee charged with the creation and maintenance of
open-source software related to data serialization
for distribution at no charge to the public.

NOW, THEREFORE, BE IT RESOLVED, that a Project Management
Committee (PMC), to be known as the Apache ZooKeeper Project,
be and hereby is established pursuant to Bylaws of the
Foundation; and be it further

RESOLVED, that the Apache ZooKeeper Project be and hereby is
responsible for the creation and maintenance of software
related to data serialization; and be it further

RESOLVED, that the office of Vice President, Apache ZooKeeper be
and hereby is created, the person holding such office to
serve at the direction of the Board of Directors as the chair
of the Apache ZooKeeper Project, and to have primary responsibility
for management of the projects within the scope of
responsibility of the Apache ZooKeeper Project; and be it further

RESOLVED, that the persons listed immediately below be and
hereby are appointed to serve as the initial members of the
Apache ZooKeeper Project:

  * Patrick Hunt ph...@apache.org
  * Flavio Junqueira f...@apache.org
  * Mahadev Konarmaha...@apache.org
  * Benjamin Reedbr...@apache.org
  * Henry Robinson   he...@apache.org

NOW, THEREFORE, BE IT FURTHER RESOLVED, that Matt Massie
be appointed to the office of Vice President, Apache ZooKeeper, to
serve in accordance with and subject to the direction of the
Board of Directors and the Bylaws of the Foundation until
death, resignation, retirement, removal or disqualification,
or until a successor is appointed; and be it further

RESOLVED, that the initial Apache ZooKeeper PMC be and hereby is
tasked with the creation of a set of bylaws intended to
encourage open development and increased participation in the
Apache ZooKeeper Project; and be it further

RESOLVED, that the Apache ZooKeeper Project be and hereby
is tasked with the migration and rationalization of the Apache
Hadoop ZooKeeper sub-project; and be it further

RESOLVED, that all responsibilities pertaining to the Apache
Hadoop ZooKeeper sub-project encumbered upon the
Apache Hadoop Project are hereafter discharged.

 On Thu, Oct 21, 2010 at 10:44 AM, Henry Robinson he...@cloudera.com wrote:

 Looks good, please do call a vote.

 On 21 October 2010 09:29, Patrick Hunt ph...@apache.org wrote:

 Here's a draft board resolution (not a vote, just discussion). It lists
 all
 current committers (except as noted in the next paragraph) as the initial
 members of the project management committee (PMC) and myself as the
 initial
 chair.

 Notice that I have left Andrew off the PMC as he has not been active with
 the project for over two years. I believe we should continue to include
 him
 on the committer roles subsequent to moving to tlp, however as he has not
 been an active member of the community for such a long period we would
 not
 include him on the PMC at this time. If others feel differently let me
 know,
 I'm willing to include him if the people feel differently.

 LMK if this looks good to you and I'll call for an official vote on this
 list (then we'll be ready to call a vote on the hadoop pmc).

 Regards,

 Patrick

 

 X. Establish the Apache ZooKeeper Project

WHEREAS, the Board of Directors deems it to be in the best
interests of the Foundation and consistent with the
Foundation's purpose to establish a Project Management
Committee charged with the creation and maintenance of
open-source software related to data serialization
for distribution at no charge to the public.

NOW, THEREFORE, BE IT RESOLVED, that a Project Management
Committee (PMC), to be known as the Apache ZooKeeper Project

Re: What's the magic behind lenBuffer and incomingBuffer?

2010-10-21 Thread Benjamin Reed

 look in readLength(). incomingBuffer is set to a newly allocated 
ByteBuffer.


ben

On 10/21/2010 07:52 AM, Thomas Koch wrote:

Hi,

inside ClientCnxn.SendThread we have

 final ByteBuffer lenBuffer = ByteBuffer.allocateDirect(4);
 ByteBuffer incomingBuffer = lenBuffer;

So incomingBuffer and lenBuffer do refer to the same object. There are several
other places where lenBuffer is again assigned to incomingBuffer.

Now inside the doIO() method we got

 if (incomingBuffer == lenBuffer) {
 recvCount++;
 readLength();
 } else if (!initialized) {

incomingBuffer is never assigned anything else then lenBuffer, lenBuffer stays
the same all the time. So as far as my knowledge of java reaches (which may
not be too far) incomingBuffer == lenBuffer _always_ evaluates to true. Isn't
that true?

So effectively we've got dead code in the elseif and else branches, didn't we?

Best regards,

Thomas Koch, http://www.koch.ro

[jira] Commented: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages

2010-10-20 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923200#action_12923200
 ] 

Benjamin Reed commented on ZOOKEEPER-907:
-

yes, this will fail the sync. it will not get passed through the pipeline. it 
will give you a partial sync though :)

 Spurious KeeperErrorCode = Session moved messages
 ---

 Key: ZOOKEEPER-907
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.1
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-907.patch


 The sync request does not set the session owner in Request.
 As a result, the leader keeps printing:
 2010-07-01 10:55:36,733 - INFO  [ProcessThread:-1:preprequestproces...@405] - 
 Got user-level KeeperException when processing sessionid:0x298d3b1fa9 
 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error 
 Path:null Error:KeeperErrorCode = Session moved

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-835) Refactoring Zookeeper Client Code

2010-10-19 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922813#action_12922813
 ] 

Benjamin Reed commented on ZOOKEEPER-835:
-

how do you see any of these things as related to ZOOKEEPER-22?

 Refactoring Zookeeper Client Code
 -

 Key: ZOOKEEPER-835
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-835
 Project: Zookeeper
  Issue Type: Improvement
  Components: java client
Affects Versions: 3.3.1
Reporter: Patrick Datko
Assignee: Thomas Koch

 Thomas Koch asked me to fill individual issues for the points raised in his 
 mail to zookeeper-dev:
 [Mail of Thomas Koch| 
 http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3c20100845.17507.tho...@koch.ro%3e
  ]
 He published several issues, which are present in the current zookeeper 
 client, so a refactoring of the code would be an facility for other 
 developers working with zookeeper.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-15 Thread Benjamin Reed (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921412#action_12921412
]

Benjamin Reed commented on ZOOKEEPER-885:
-

we are having problems reproducing this. can you give a bit more details on the
machines you are using? what are the cpu and memory size? also, what is the
throughput of dd if=/dev/zero of=/dev/mapper/nimbula-test? is there just one
disk, where nimbula-test is a partition on that disk and you have another
partition for the snapshots and logs?

even if you don't have swap space, code pages can be discarded and loaded on
demand, so that could be a potential problem. what does /proc/meminfo look like?

Zookeeper drops connections under moderate IO load
--

Key: ZOOKEEPER-885
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885
Project: Zookeeper
Issue Type: Bug
Components: server
Affects Versions: 3.2.2, 3.3.1
Environment: Debian (Lenny)
1Gb RAM
swap disabled
100Mb heap for zookeeper
Reporter: Alexandre Hardy
Priority: Critical
Attachments: benchmark.csv, tracezklogs.tar.gz, tracezklogs.tar.gz,
WatcherTest.java, zklogs.tar.gz

A zookeeper server under minimum load, with a number of clients watching
exactly one node will fail to maintain the connection when the machine is
subjected to moderate IO load.
In a specific test example we had three zookeeper servers running on
dedicated machines with 45 clients connected, watching exactly one node. The
clients would disconnect after moderate load was added to each of the
zookeeper servers with the command:
{noformat}
dd if=/dev/urandom of=/dev/mapper/nimbula-test
{noformat}
The {{dd}} command transferred data at a rate of about 4Mb/s.
The same thing happens with
{noformat}
dd if=/dev/zero of=/dev/mapper/nimbula-test
{noformat}
It seems strange that such a moderate load should cause instability in the
connection.
Very few other processes were running, the machines were setup to test the
connection instability we have experienced. Clients performed no other read
or mutation operations.
Although the documents state that minimal competing IO load should present on
the zookeeper server, it seems reasonable that moderate IO should not cause
problems in this case.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: What's the QA strategy of ZooKeeper?

2010-10-15 Thread Benjamin Reed


 i think we have a very different perspective on the quality issue:



I didn't want to say it that clear, but especially the new Netty code, both on 
client and server side is IMHO an example of new code in very bad shape. The
client code patch even changes the FindBugs configuration to exclude the new
code from the FindBugs checks.

great. fixing the code and refactoring before a patch goes in is the 
perfect time to do it! please give feedback and help make the patch 
better. there is a reason to exclude checks (which is why there is such 
excludes), but if we can avoid them we should. before a patch is applied 
is exactly the time to do cleanup

If your code is already in such a bad shape, that every change includes
considerable risk to break something, then you already are in trouble. With
every new feature (or bugfix!) you also risk to break something.
If you don't have the attitude of permanent refactoring to improve the code
quality, you will inevitably lower the maintainability of your code with every
new feature. New features will build on the dirty concepts already in the code
and therfor make it more expensive to ever clean things up.

cleaning up code to add a new feature is a great time to clean up the code.

Yes. Refactoring isn't easy, but necessary. Only over time you better
understand your domain and find better structures. Over time you introduce
features that let code grow so that it should better be split up in smaller
units that the human brain can still handle.

it is the but necessary that i disagree with. there is plenty of code 
that could be cleaned up and made to look a lot nicer, but we shouldn't 
touch it, unless we are fixing something else or adding a new feature. 
it's pretty lame to explain to someone that the bug that was introduced 
by a code change was motivated by a desire to make the code cleaner. any 
code change runs the risk of breakage, thus changing code simply for 
cleanliness is not worth the risk.


ben

Re: What's the QA strategy of ZooKeeper?

2010-10-15 Thread Benjamin Reed

 actually, the other way of doing the netty patch (since i'm scared of 
merges) would be to do a refactor cleanup patch with an eye toward 
netty, and then another patch to actually add netty. that would have 
been nice because the first patch would allow us to more easily make 
sure that NIO wasn't broken. and the second we could focus more on the 
netty addition.


ben

On 10/15/2010 03:07 PM, Patrick Hunt wrote:

On Fri, Oct 15, 2010 at 12:11 PM, Henry Robinsonhe...@cloudera.com  wrote:


The netty patch is a good test case for this approach. If we feel that
reworking the structure of the existing server cnxn code will make it
significantly easier to add a second implementation that adheres to the
same
interface, then I say that such a refactoring is worthwhile, but even then
only if it's straightforward to make the changes while convincing ourselves
that the behaviour of the new implementation is consistent with the old.

Thomas, do comment on the patch itself! That's the very best way to make
sure your concerns get heard and addressed.


Well really the _best_ way IMO is to both comment and submit a patch. ;-)

And this is just what Thomas is doing, so kudos to him for the effort!
Vishal is doing this as well for many of the issues he's found, so thanks to
him also. We do appreciate you guys jumping in to help. Lack of contributors
is one of the things we've been missing and addressing that opens the door
to some of these improvements being suggested.

Wrt the netty patch, the approach Ben and I took was to refactor
sufficiently to add support for NIO/Netty/... while minimizing breakage.
This is already a big patch, esp given that the code is not really as clean
to begin with (complex too). Perfect situation, no. But the intent was to
further clean things up once the original patch was reviewed/committed.
Trying to do a huge refactoring in one shot (one patch) is not a good idea
imo. Already these patches are too large. Perhaps lesson learned here is
that we should have just created a special branch from the get go, applied a
number of smaller patches to that branch, then eventually merged back into
the trunk once it was fully baked.


Patrick

Re: Restarting discussion on ZooKeeper as a TLP

2010-10-14 Thread Benjamin Reed


 +1

ben

On 10/14/2010 11:47 AM, Henry Robinson wrote:

+1,

I agree that we've addressed most outstanding concerns, we're ready for
TLP.

Henry

On 14 October 2010 13:29, Mahadev Konarmaha...@yahoo-inc.com  wrote:


+1 for moving to TLP.

Thanks for starting the vote Pat.

mahadev


On 10/13/10 2:10 PM, Patrick Huntph...@apache.org  wrote:


In March of this year we discussed a request from the Apache Board, and
Hadoop PMC, that we become a TLP rather than a subproject of Hadoop:

Original discussion
http://markmail.org/thread/42cobkpzlgotcbin

I originally voted against this move, my primary concern being that we

were

not ready to move to tlp status given our small contributor base and
limited contributor diversity. However I'd now like to revisit that
discussion/decision. Since that time the team has been working hard to
attract new contributors, and we've seen significant new contributions

come

in. There has also been feedback from board/pmc addressing many of these
concerns (both on the list and in private). I am now less concerned about
this issue and don't see it as a blocker for us to move to TLP status.

A second concern was that by becoming a TLP the project would lose it's
connection with Hadoop, a big source of new users for us. I've been

assured

(and you can see with the other projects that have moved to tlp status;
pig/hive/hbase/etc...) that this connection will be maintained. The

Hadoop

ZooKeeper tab for example will redirect to our new homepage.

Other Apache members also pointed out to me that we are essentially
operating as a TLP within the Hadoop PMC. Most of the other PMC members

have

little or no experience with ZooKeeper and this makes it difficult for

them

to monitor and advise us. By moving to TLP status we'll be able to govern
ourselves and better set our direction.

I believe we are ready to become a TLP. Please respond to this email with
your thoughts and any issues. I will call a vote in a few days, once
discussion settles.

Regards,

Patrick

Re: What's the QA strategy of ZooKeeper?

2010-10-14 Thread Benjamin Reed

 code quality is important, and there are things we should keep in 
mind, but in general i really don't like the idea of risking code 
breakage because of a gratuitous code cleanup. we should be watching out 
for these things when patches get submitted or when new things go in.


i think this is inline with what pat was saying. just to expand a bit. 
in my opinion clean up refactorings have the following problems:


1) you risk breaking things in production for a potential future 
maintenance advantage.
2) there is always subjectivity: quality code for one code quality 
zealot is often seen as a bad design by another code quality zealot. 
unless there is an objective reason to do it, don't.
3) you may cleanup the wrong way. you may restructure to make the 
current code clean and then end up rewriting and refactoring again to 
change the logic.


i think we can mitigate 1) by only doing it when necessary. as a 
corollary we can mitigate 2) and 3) by only doing refactoring/cleanups 
when motivated by some new change: fix a bug, increased performance, new 
feature, etc.


ben

On 10/13/2010 06:18 AM, Thomas Koch wrote:

Hi,

after filling 13 refactoring issues against the Java Client code[1], I started
to dig into the server site code to understand the last issues with the Netty
stuff.
I feel bad. It's this feeling of I don't wanna hurt you, but ZooKeeper
is quite an important piece of the Hadoop ecosystem containing some of the
most complicated pieces of code. And it'll only get more complex with more
features.

I'd propose to have a word about quality assurance. Is there already a
strategy to ensure the ongoing maintainability of ZK? Is there a code style
guide, a list of Dos-And-Donts (where I'd like to add some points)?

Should PMD be added to Hudson? What is the level of FindBugs? Should it be
raised?

Some of the points I'd like to add to a style guide:

- Don't write methods longer then 20-40 lines of code

- Are you sure you want to use inner classes?

- If there is a new operator in a method? Could the method maybe already
receive the object as a parameter?

- Are you sure you want to use system properties? They are like global
variables and the IDE does not know about them

- Are you sure you want to extend a class? Often an aggregation is more
elegant.

- Don't nest ifs and loops deeper then 2 or 3 levels. If you do so, you should
better break your code into more methods.

- Use Enums or constants instead of plain status integers

- please document your intentions in code comments. You don't need to comment
the what? but the why?.

Do you agree with me, that there is a need for better code quality in
ZooKeeper? If so, it's not really scalable if a manic like me fights like Don
Quichotte to clean up the code. All developers would need to establish a sense
for clean code and constantly improve the code.

[1] https://issues.apache.org/jira/browse/ZOOKEEPER-835

Best regards,

Thomas Koch, http://www.koch.ro

[jira] Updated: (ZOOKEEPER-881) ZooKeeperServer.loadData loads database twice

2010-10-14 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-881:


Hadoop Flags: [Reviewed]

+1 nice catch!

 ZooKeeperServer.loadData loads database twice
 -

 Key: ZOOKEEPER-881
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-881
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Trivial
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-881.patch


 zkDb.loadDataBase() is called twice at the beginning of loadData().  It 
 shouldn't have any negative affects, but is unnecessary.   A patch should be 
 trivial.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-881) ZooKeeperServer.loadData loads database twice

2010-10-14 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921233#action_12921233
 ] 

Benjamin Reed commented on ZOOKEEPER-881:
-

Committed revision 1022824.


 ZooKeeperServer.loadData loads database twice
 -

 Key: ZOOKEEPER-881
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-881
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Trivial
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-881.patch


 zkDb.loadDataBase() is called twice at the beginning of loadData().  It 
 shouldn't have any negative affects, but is unnecessary.   A patch should be 
 trivial.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-864) Hedwig C++ client improvements

2010-10-11 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-864:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

thanx ivan!
Committed revision 1021463.


 Hedwig C++ client improvements
 --

 Key: ZOOKEEPER-864
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-864
 Project: Zookeeper
  Issue Type: Improvement
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 3.4.0

 Attachments: warnings.txt, ZOOKEEPER-864.diff, ZOOKEEPER-864.diff, 
 ZOOKEEPER-864.diff, ZOOKEEPER-864.diff


 I changed the socket code to use boost asio. Now the client only creates one 
 thread, and all operations are non-blocking. 
 Tests are now automated, just run make check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-886) Hedwig Server stays in disconnected state when connection to ZK dies but gets reconnected

2010-10-11 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-886:


Hadoop Flags: [Reviewed]

+1 good catch erwin!

 Hedwig Server stays in disconnected state when connection to ZK dies but 
 gets reconnected
 ---

 Key: ZOOKEEPER-886
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-886
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-hedwig
Reporter: Erwin Tam
Assignee: Erwin Tam
 Attachments: ZOOKEEPER-886.patch


 The Hedwig Server is connected to ZooKeeper.  In the ZkTopicManager, it 
 registers a watcher so that if it ever gets disconnected from ZK, it will 
 temporarily fail all incoming requests since the Hedwig server does not know 
 for sure if it is still the master for the topics.  When the ZK client gets 
 reconnected, the logic currently is wrong and it does not unset the suspended 
 flag.  Thus once it gets disconnected, it will stay in the suspended state 
 forever, thereby making the Hedwig server hub dead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-886) Hedwig Server stays in disconnected state when connection to ZK dies but gets reconnected

2010-10-11 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-886:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed revision 1021501.


 Hedwig Server stays in disconnected state when connection to ZK dies but 
 gets reconnected
 ---

 Key: ZOOKEEPER-886
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-886
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-hedwig
Reporter: Erwin Tam
Assignee: Erwin Tam
 Attachments: ZOOKEEPER-886.patch


 The Hedwig Server is connected to ZooKeeper.  In the ZkTopicManager, it 
 registers a watcher so that if it ever gets disconnected from ZK, it will 
 temporarily fail all incoming requests since the Hedwig server does not know 
 for sure if it is still the master for the topics.  When the ZK client gets 
 reconnected, the logic currently is wrong and it does not unset the suspended 
 flag.  Thus once it gets disconnected, it will stay in the suspended state 
 forever, thereby making the Hedwig server hub dead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

2010-10-06 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-822:


Hadoop Flags: [Reviewed]

+1 looks good. ready to commit.

 Leader election taking a long time  to complete
 ---

 Key: ZOOKEEPER-822
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.0
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, 
 test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, 
 ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, 
 ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, 
 ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, 
 ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1


 Created a 3 node cluster.
 1 Fail the ZK leader
 2. Let leader election finish. Restart the leader and let it join the 
 3. Repeat 
 After a few rounds leader election takes anywhere 25- 60 seconds to finish. 
 Note- we didn't have any ZK clients and no new znodes were created.
 zoo.cfg is shown below:
 #Mon Jul 19 12:15:10 UTC 2010
 server.1=192.168.4.12\:2888\:3888
 server.0=192.168.4.11\:2888\:3888
 clientPort=2181
 dataDir=/var/zookeeper
 syncLimit=2
 server.2=192.168.4.13\:2888\:3888
 initLimit=5
 tickTime=2000
 I have attached logs from two nodes that took a long time to form the cluster 
 after failing the leader. The leader was down anyways so logs from that node 
 shouldn't matter.
 Look for START HERE. Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

2010-09-28 Thread Benjamin Reed (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915796#action_12915796
]

Benjamin Reed commented on ZOOKEEPER-822:
-

looks good overall flavio. just a quick questions: i notice that operations on
senderWorkerMap in initiateConnection are not synchronized. senderWorkerMap is
concurrent, but there could be a race between the get, put, and vsw.finish if
initiateConnection is called concurrently for the same sid. right?

also you need to add a blurb to the config doc for the timeout system variable,
which should be zookeeper.cnxtimeout so that it can be set from the
configuration file.

Leader election taking a long time to complete
---

Key: ZOOKEEPER-822
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
Project: Zookeeper
Issue Type: Bug
Components: quorum
Affects Versions: 3.3.0
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
Fix For: 3.3.2, 3.4.0

Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log,
test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz,
ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch,
ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch,
ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch,
ZOOKEEPER-822.patch_v1

Created a 3 node cluster.
1 Fail the ZK leader
2. Let leader election finish. Restart the leader and let it join the
3. Repeat
After a few rounds leader election takes anywhere 25- 60 seconds to finish.
Note- we didn't have any ZK clients and no new znodes were created.
zoo.cfg is shown below:
#Mon Jul 19 12:15:10 UTC 2010
server.1=192.168.4.12\:2888\:3888
server.0=192.168.4.11\:2888\:3888
clientPort=2181
dataDir=/var/zookeeper
syncLimit=2
server.2=192.168.4.13\:2888\:3888
initLimit=5
tickTime=2000
I have attached logs from two nodes that took a long time to form the cluster
after failing the leader. The leader was down anyways so logs from that node
shouldn't matter.
Look for START HERE. Logs after that point should be of our interest.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-820) update c unit tests to ensure zombie java server processes don't cause failure

2010-09-28 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915799#action_12915799
 ] 

Benjamin Reed commented on ZOOKEEPER-820:
-

+1 this looks good to me. did you try it on cygwin?

 update c unit tests to ensure zombie java server processes don't cause 
 failure
 

 Key: ZOOKEEPER-820
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-820
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.1
Reporter: Patrick Hunt
Assignee: Michi Mutsuzaki
Priority: Critical
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-820-1.patch, ZOOKEEPER-820.patch


 When the c unit tests are run sometimes the server doesn't shutdown at the 
 end of the test, this causes subsequent tests (hudson esp) to fail.
 1) we should try harder to make the server shut down at the end of the test, 
 I suspect this is related to test failing/cleanup
 2) before the tests are run we should see if the old server is still running 
 and try to shut it down

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-09-28 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915849#action_12915849
 ] 

Benjamin Reed commented on ZOOKEEPER-880:
-

is there an easy way to reproduce this?

 QuorumCnxManager$SendWorker grows without bounds
 

 Key: ZOOKEEPER-880
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.2.2
Reporter: Jean-Daniel Cryans
 Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
 hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
 TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz


 We're seeing an issue where one server in the ensemble has a steady growing 
 number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
 out of native threads, and at the same time we see a lot of exceptions in the 
 logs.  This is on 3.2.2 and our config looks like:
 {noformat}
 tickTime=3000
 dataDir=/somewhere_thats_not_tmp
 clientPort=2181
 initLimit=10
 syncLimit=5
 server.0=sv4borg9:2888:3888
 server.1=sv4borg10:2888:3888
 server.2=sv4borg11:2888:3888
 server.3=sv4borg12:2888:3888
 server.4=sv4borg13:2888:3888
 {noformat}
 The issue is on the first server. I'm going to attach threads dumps and logs 
 in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-869) Support for election of leader with arbitrary zxid

2010-09-17 Thread Benjamin Reed (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910643#action_12910643
]

Benjamin Reed commented on ZOOKEEPER-869:
-

this is a good observation diogo, but i think you may be characterizing it
improperly. the problem is that when we do a leadership we increment the epoch
and propose a new leader, so all other processes will be much lower than the
leader. when a follower connects we figure out how far behind the follower is
by comparing the lastProposed zxids and taking the difference. we should really
be using the recent history to do the comparison.

as a side note, if we were to chose not to take the maximum zxid during
recovery, we need to make sure that we still cover all committed messages.

Support for election of leader with arbitrary zxid
--

Key: ZOOKEEPER-869
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-869
Project: Zookeeper
Issue Type: New Feature
Reporter: Diogo
Priority: Minor

Currently, the leader election algorithm implemented guarantees that the
leader has the maximum zxid of the ensemble. The state synchronization after
the election was built based on this assumption. However, other leader
elections algorithms might elect leaders with arbitrary zxid.
To support other leader election algorithms, the state synchronization should
allow the leader to have an arbitrary zxid.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-831) BookKeeper: Throttling improved for reads

2010-09-17 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-831:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed revision 998200.

thanx for the fix flavio and ivan for the reviews!

 BookKeeper: Throttling improved for reads
 -

 Key: ZOOKEEPER-831
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-831
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Affects Versions: 3.3.1
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-831.patch, ZOOKEEPER-831.patch, 
 ZOOKEEPER-831.patch, ZOOKEEPER-831.patch


 Reads and writes in BookKeeper are asymmetric: a write request writes one 
 entry, whereas a read request may read multiple requests. The current 
 implementation of throttling only counts the number of read requests instead 
 of counting the number of entries being read. Consequently, a few read 
 requests reading a large number of entries each will spawn a large number of 
 read-entry requests. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call

2010-09-15 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-846:


Hadoop Flags: [Reviewed]

+1 looks good pat! it's nice that the checking and setting of closing is in the 
same routine. i agreed about skipping the test case.

 zookeeper client doesn't shut down cleanly on the close call
 

 Key: ZOOKEEPER-846
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.2.2
Reporter: Ted Yu
Assignee: Patrick Hunt
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: rs-13.stack, ZOOKEEPER-846.patch


 Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where 
 Regionserver
 process was shutting down and seemed to hang.
 Here is the bottom of region server log:
 http://pastebin.com/YYawJ4jA
 zookeeper-3.2.2 is used.
 Here is relevant portion from jstack - I attempted to attach jstack twice in 
 my email to d...@hbase.apache.org but failed:
 DestroyJavaVM prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 regionserver/10.32.42.245:60020 prio=10 tid=0x2aabb84ce000 nid=0x6c81 
 in Object.wait() [0x43755000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on 0x2aaab76633c0 (a 
 org.apache.zookeeper.ClientCnxn$Packet)
 at java.lang.Object.wait(Object.java:485)
 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
 - locked 0x2aaab76633c0 (a 
 org.apache.zookeeper.ClientCnxn$Packet)
 at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
 at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
 - locked 0x2aaabf5e0c30 (a org.apache.zookeeper.ZooKeeper)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
 at java.lang.Thread.run(Thread.java:619)
 main-EventThread daemon prio=10 tid=0x43474000 nid=0x6c80 waiting 
 on condition [0x413f3000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x2aaabf6e9150 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes

2010-08-24 Thread Benjamin Reed (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901949#action_12901949
]

Benjamin Reed commented on ZOOKEEPER-366:
-

holger you are correct. nanoTime is the way to go. i'll prepare a fix. one
problem with it is that the fix will be impossible to test.

Session timeout detection can go wrong if the leader system time changes

Key: ZOOKEEPER-366
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366
Project: Zookeeper
Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed
Attachments: ZOOKEEPER-366.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes

2010-08-20 Thread Benjamin Reed (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Benjamin Reed updated ZOOKEEPER-366:

Attachment: ZOOKEEPER-366.patch

this patch smooths out the effect of a radical time change by always sleeping
at least 1/2 tickTime. this means that if we really needed to do a big jump
forward, it will take up 1/2 of the jump to converge on the real time. because
clients ping for idle times of 1/3 the timeout, there should be few sessions
that expire. we could reduce that number, but take even longer to converge if
we always sleep at least 3/4 of the tickTime.

Session timeout detection can go wrong if the leader system time changes

Key: ZOOKEEPER-366
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366
Project: Zookeeper
Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed
Attachments: ZOOKEEPER-366.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes

2010-08-20 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900824#action_12900824
 ] 

Benjamin Reed commented on ZOOKEEPER-366:
-

anyone have an idea of how to test this? i need to mock 
System.currentTimeMillis().

 Session timeout detection can go wrong if the leader system time changes
 

 Key: ZOOKEEPER-366
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed
 Attachments: ZOOKEEPER-366.patch


 the leader tracks session expirations by calculating when a session will 
 timeout and then periodically checking to see what needs to be timed out 
 based on the current time. this works great as long as the leaders clock 
 progresses at a steady pace. the problem comes when there are big (session 
 size) changes in clock, by ntp for example. if time gets adjusted forward, 
 all the sessions could timeout immediately. if time goes backward sessions 
 that should timeout may take a lot longer to actually expire.
 this is really just a leader issue. the easiest way to deal with this is to 
 have the leader relinquish leadership if it detects a big jump forward in 
 time. when a new leader gets elected, it will recalculate timeouts of active 
 sessions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes

2010-08-19 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed reassigned ZOOKEEPER-366:
---

Assignee: Benjamin Reed

 Session timeout detection can go wrong if the leader system time changes
 

 Key: ZOOKEEPER-366
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed

 the leader tracks session expirations by calculating when a session will 
 timeout and then periodically checking to see what needs to be timed out 
 based on the current time. this works great as long as the leaders clock 
 progresses at a steady pace. the problem comes when there are big (session 
 size) changes in clock, by ntp for example. if time gets adjusted forward, 
 all the sessions could timeout immediately. if time goes backward sessions 
 that should timeout may take a lot longer to actually expire.
 this is really just a leader issue. the easiest way to deal with this is to 
 have the leader relinquish leadership if it detects a big jump forward in 
 time. when a new leader gets elected, it will recalculate timeouts of active 
 sessions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes

2010-08-19 Thread Benjamin Reed (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900511#action_12900511
]

Benjamin Reed commented on ZOOKEEPER-366:
-

after discussion this on the list, we realized that we can detect a big jump in
time change in the session expiration thread. since we expire a bucket of
sessions each tick, if we run into the situation where we are going to expire
more than one bucket in a row, we know we have jumped forward in time. we can
smooth the jump by requiring at least a 1/2 ticktime wait between each
bucket.

Session timeout detection can go wrong if the leader system time changes

Key: ZOOKEEPER-366
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366
Project: Zookeeper
Issue Type: Bug
Reporter: Benjamin Reed

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-795) eventThread isn't shutdown after a connection session expired event coming

2010-08-17 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-795:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed revision 986470. in branch 3.3


 eventThread isn't shutdown after a connection session expired event coming
 

 Key: ZOOKEEPER-795
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-795
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.3.1
 Environment: ubuntu 10.04
Reporter: mathieu barcikowski
Assignee: Sergey Doroshenko
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ExpiredSessionThreadLeak.java, ZOOKEEPER-795.patch, 
 ZOOKEEPER-795.patch


 Hi,
 I notice a problem with the eventThread located in ClientCnxn.java file.
 The eventThread isn't shutdown after a connection session expired event 
 coming (i.e. never receive EventOfDeath).
 When a session timeout occurs and the session is marked as expired, the 
 connexion is fully closed (socket, SendThread...) expect for the eventThread.
 As a result, if i create a new zookeeper object and connect through it, I got 
 a zombi thread which will never be kill (as for the previous zookeeper 
 object, the state is already close, calling close again don't do anything).
 So everytime I will create a new zookeeper connection after a expired 
 session, I will have a one more zombi EventThread.
 How to reproduce :
 - Start a zookeeper client connection in debug mode
 - Pause the jvm enough time to the expired event occur
 - Watch for example with jvisualvm the list of threads, the sendThread is 
 succesfully killed, but the EventThread go to wait state for a infinity of 
 time
 - if you reopen a new zookeeper connection, and do again the previous steps, 
 another EventThread will be present in infinite wait state

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-733) use netty to handle client connections

2010-08-16 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899101#action_12899101
 ] 

Benjamin Reed commented on ZOOKEEPER-733:
-

we should commit the patch as is. trying to add features to it and maintain the 
patch fresh is too unwieldy!

 use netty to handle client connections
 --

 Key: ZOOKEEPER-733
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-733
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Benjamin Reed
Assignee: Patrick Hunt
 Fix For: 3.4.0

 Attachments: accessive.jar, flowctl.zip, moved.zip, 
 QuorumTestFailed_sessionmoved_TRACE_LOG.txt.gz, ZOOKEEPER-733.patch, 
 ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, 
 ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, 
 ZOOKEEPER-733.patch, ZOOKEEPER-733.patch


 we currently have our own asynchronous NIO socket engine to be able to handle 
 lots of clients with a single thread. over time the engine has become more 
 complicated. we would also like the engine to use multiple threads on 
 machines with lots of cores. plus, we would like to be able to support things 
 like SSL. if we switch to netty, we can simplify our code and get the 
 previously mentioned benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (ZOOKEEPER-845) remove duplicate code from netty and nio ServerCnxn classes

2010-08-12 Thread Benjamin Reed (JIRA)

remove duplicate code from netty and nio ServerCnxn classes
---

 Key: ZOOKEEPER-845
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-845
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Benjamin Reed


the code for handling the 4-letter words is duplicated between the nio and 
netty versions of ServerCnxn. this makes maintenance problematic. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-845) remove duplicate code from netty and nio ServerCnxn classes

2010-08-12 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897880#action_12897880
 ] 

Benjamin Reed commented on ZOOKEEPER-845:
-

perhaps we could extract the actual processing logic from the threading model.

 remove duplicate code from netty and nio ServerCnxn classes
 ---

 Key: ZOOKEEPER-845
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-845
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Benjamin Reed
 Fix For: 3.4.0


 the code for handling the 4-letter words is duplicated between the nio and 
 netty versions of ServerCnxn. this makes maintenance problematic. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-775) A large scale pub/sub system

2010-08-11 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897581#action_12897581
 ] 

Benjamin Reed commented on ZOOKEEPER-775:
-

i believe the NOTICE file is consistent with: 
http://apache.org/legal/src-headers.html#header-existingcopyright

 A large scale pub/sub system
 

 Key: ZOOKEEPER-775
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775
 Project: Zookeeper
  Issue Type: New Feature
  Components: contrib
Reporter: Benjamin Reed
Assignee: Benjamin Reed
 Fix For: 3.4.0

 Attachments: libs.zip, libs_2.zip, ZOOKEEPER-775.patch, 
 ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch, 
 ZOOKEEPER-775_3.patch


 we have developed a large scale pub/sub system based on ZooKeeper and 
 BookKeeper.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-338) zk hosts should be resolved periodically for loadbalancing amongst zk servers.

2010-08-10 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-338:


Component/s: java client
 (was: c client)

it is an issue for both the c and java clients.

 zk hosts should be resolved periodically for loadbalancing amongst zk servers.
 --

 Key: ZOOKEEPER-338
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-338
 Project: Zookeeper
  Issue Type: New Feature
  Components: c client, java client
Affects Versions: 3.0.0, 3.0.1, 3.1.0
Reporter: Mahadev konar

 The list of host names passed to ZK init method is resolved only once. Had a 
 corresponding DNS entry been changed, it
 would not be refreshed by the ZK library,effectively preventing from proper 
 load balancing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-338) zk hosts should be resolved periodically for loadbalancing amongst zk servers.

2010-08-10 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-338:


Component/s: c client

 zk hosts should be resolved periodically for loadbalancing amongst zk servers.
 --

 Key: ZOOKEEPER-338
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-338
 Project: Zookeeper
  Issue Type: New Feature
  Components: c client, java client
Affects Versions: 3.0.0, 3.0.1, 3.1.0
Reporter: Mahadev konar

 The list of host names passed to ZK init method is resolved only once. Had a 
 corresponding DNS entry been changed, it
 would not be refreshed by the ZK library,effectively preventing from proper 
 load balancing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed

2010-08-10 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896948#action_12896948
 ] 

Benjamin Reed commented on ZOOKEEPER-794:
-

alexis, i'm missing the problem you are pointing out. is it an issue with the 
ordering of the callbacks?

i'm also wondering about your _3 patch. it is much smaller than the others. is 
it to be applied to trunk, or is it relative to a different patch?

 Callbacks are not invoked when the client is closed
 ---

 Key: ZOOKEEPER-794
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.3.1
Reporter: Alexis Midon
Assignee: Alexis Midon
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, 
 ZOOKEEPER-794_2.patch, ZOOKEEPER-794_3.patch


 I noticed that ZooKeeper has different behaviors when calling synchronous or 
 asynchronous actions on a closed ZooKeeper client.
 Actually a synchronous call will throw a session expired exception while an 
 asynchronous call will do nothing. No exception, no callback invocation.
 Actually, even if the EventThread receives the Packet with the session 
 expired err code, the packet is never processed since the thread has been 
 killed by the ventOfDeath. So the call back is not invoked.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-829) Add /zookeeper/sessions/* to allow inspection/manipulation of client sessions

2010-07-29 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893910#action_12893910
 ] 

Benjamin Reed commented on ZOOKEEPER-829:
-

should we kill the session immediately or wait until the sessionTimeout. 
killing it immediate seems like it is violating a contract.

 Add /zookeeper/sessions/* to allow inspection/manipulation of client sessions
 -

 Key: ZOOKEEPER-829
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-829
 Project: Zookeeper
  Issue Type: New Feature
  Components: server
Reporter: Todd Lipcon

 For some use cases in HBase (HBASE-1316 in particular) we'd like the ability 
 to forcible expire someone else's ZK session. Patrick and I discussed on IRC 
 and came up with an idea of creating nodes in /zookeeper/sessions/session 
 id that can be read in order to get basic stats about a session, and written 
 in order to manipulate one. The manipulation we need in HBase is the ability 
 to write a command like kill, but others might be useful as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-795) eventThread isn't shutdown after a connection session expired event coming

2010-07-28 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-795:


Status: Patch Available  (was: Open)

 eventThread isn't shutdown after a connection session expired event coming
 

 Key: ZOOKEEPER-795
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-795
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.3.1
 Environment: ubuntu 10.04
Reporter: mathieu barcikowski
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ExpiredSessionThreadLeak.java, ZOOKEEPER-795.patch, 
 ZOOKEEPER-795.patch


 Hi,
 I notice a problem with the eventThread located in ClientCnxn.java file.
 The eventThread isn't shutdown after a connection session expired event 
 coming (i.e. never receive EventOfDeath).
 When a session timeout occurs and the session is marked as expired, the 
 connexion is fully closed (socket, SendThread...) expect for the eventThread.
 As a result, if i create a new zookeeper object and connect through it, I got 
 a zombi thread which will never be kill (as for the previous zookeeper 
 object, the state is already close, calling close again don't do anything).
 So everytime I will create a new zookeeper connection after a expired 
 session, I will have a one more zombi EventThread.
 How to reproduce :
 - Start a zookeeper client connection in debug mode
 - Pause the jvm enough time to the expired event occur
 - Watch for example with jvisualvm the list of threads, the sendThread is 
 succesfully killed, but the EventThread go to wait state for a infinity of 
 time
 - if you reopen a new zookeeper connection, and do again the previous steps, 
 another EventThread will be present in infinite wait state

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-795) eventThread isn't shutdown after a connection session expired event coming

2010-07-28 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-795:


Attachment: ZOOKEEPER-795.patch

i've added a test. (added to the existing session expiration test, so it 
shouldn't add any running time to the tests)

 eventThread isn't shutdown after a connection session expired event coming
 

 Key: ZOOKEEPER-795
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-795
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.3.1
 Environment: ubuntu 10.04
Reporter: mathieu barcikowski
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ExpiredSessionThreadLeak.java, ZOOKEEPER-795.patch, 
 ZOOKEEPER-795.patch


 Hi,
 I notice a problem with the eventThread located in ClientCnxn.java file.
 The eventThread isn't shutdown after a connection session expired event 
 coming (i.e. never receive EventOfDeath).
 When a session timeout occurs and the session is marked as expired, the 
 connexion is fully closed (socket, SendThread...) expect for the eventThread.
 As a result, if i create a new zookeeper object and connect through it, I got 
 a zombi thread which will never be kill (as for the previous zookeeper 
 object, the state is already close, calling close again don't do anything).
 So everytime I will create a new zookeeper connection after a expired 
 session, I will have a one more zombi EventThread.
 How to reproduce :
 - Start a zookeeper client connection in debug mode
 - Pause the jvm enough time to the expired event occur
 - Watch for example with jvisualvm the list of threads, the sendThread is 
 succesfully killed, but the EventThread go to wait state for a infinity of 
 time
 - if you reopen a new zookeeper connection, and do again the previous steps, 
 another EventThread will be present in infinite wait state

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-790:



+1 excellent work you guys. i also like QuorumUtil sergey! thanx for 
implementing it.

 Last processed zxid set prematurely while establishing leadership
 -

 Key: ZOOKEEPER-790
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.1
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-3.3.patch, 
 ZOOKEEPER-790-follower-request-NPE.log, ZOOKEEPER-790-test.patch, 
 ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, 
 ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2, 
 ZOOKEEPER-790.v2.patch, ZOOKEEPER-790.v2.patch


 The leader code is setting the last processed zxid to the first of the new 
 epoch even before connecting to a quorum of followers. Because the leader 
 code sets this value before connecting to a quorum of followers 
 (Leader.java:281) and the follower code throws an IOException 
 (Follower.java:73) if the leader epoch is smaller, we have that when the 
 false leader drops leadership and becomes a follower, it finds a smaller 
 epoch and kills itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-775) A large scale pub/sub system


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892860#action_12892860
 ] 

Benjamin Reed commented on ZOOKEEPER-775:
-

can we do the forrest doc as a separate patch? it's already quite large as it 
is.

 A large scale pub/sub system
 

 Key: ZOOKEEPER-775
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775
 Project: Zookeeper
  Issue Type: New Feature
  Components: contrib
Reporter: Benjamin Reed
Assignee: Benjamin Reed
 Fix For: 3.4.0

 Attachments: libs.zip, libs_2.zip, ZOOKEEPER-775.patch, 
 ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch, ZOOKEEPER-775_3.patch


 we have developed a large scale pub/sub system based on ZooKeeper and 
 BookKeeper.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-733) use netty to handle client connections


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-733:


Status: Open  (was: Patch Available)

 use netty to handle client connections
 --

 Key: ZOOKEEPER-733
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-733
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Benjamin Reed
Assignee: Patrick Hunt
 Fix For: 3.4.0

 Attachments: accessive.jar, flowctl.zip, moved.zip, 
 QuorumTestFailed_sessionmoved_TRACE_LOG.txt.gz, ZOOKEEPER-733.patch, 
 ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, 
 ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, 
 ZOOKEEPER-733.patch


 we currently have our own asynchronous NIO socket engine to be able to handle 
 lots of clients with a single thread. over time the engine has become more 
 complicated. we would also like the engine to use multiple threads on 
 machines with lots of cores. plus, we would like to be able to support things 
 like SSL. if we switch to netty, we can simplify our code and get the 
 previously mentioned benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-733) use netty to handle client connections


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-733:


Status: Patch Available  (was: Open)

 use netty to handle client connections
 --

 Key: ZOOKEEPER-733
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-733
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Benjamin Reed
Assignee: Patrick Hunt
 Fix For: 3.4.0

 Attachments: accessive.jar, flowctl.zip, moved.zip, 
 QuorumTestFailed_sessionmoved_TRACE_LOG.txt.gz, ZOOKEEPER-733.patch, 
 ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, 
 ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, 
 ZOOKEEPER-733.patch


 we currently have our own asynchronous NIO socket engine to be able to handle 
 lots of clients with a single thread. over time the engine has become more 
 complicated. we would also like the engine to use multiple threads on 
 machines with lots of cores. plus, we would like to be able to support things 
 like SSL. if we switch to netty, we can simplify our code and get the 
 previously mentioned benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-733) use netty to handle client connections


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893025#action_12893025
 ] 

Benjamin Reed commented on ZOOKEEPER-733:
-

i ran this on 40 machines simulating 900 clients. the benchmark went well 
without problems. the results don't show any real significant performance 
improvements (or degradations).

 use netty to handle client connections
 --

 Key: ZOOKEEPER-733
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-733
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Benjamin Reed
Assignee: Patrick Hunt
 Fix For: 3.4.0

 Attachments: accessive.jar, flowctl.zip, moved.zip, 
 QuorumTestFailed_sessionmoved_TRACE_LOG.txt.gz, ZOOKEEPER-733.patch, 
 ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, 
 ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, 
 ZOOKEEPER-733.patch


 we currently have our own asynchronous NIO socket engine to be able to handle 
 lots of clients with a single thread. over time the engine has become more 
 complicated. we would also like the engine to use multiple threads on 
 machines with lots of cores. plus, we would like to be able to support things 
 like SSL. if we switch to netty, we can simplify our code and get the 
 previously mentioned benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership

2010-07-23 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-790:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed revision 966960.
Committed revision 966984.

 Last processed zxid set prematurely while establishing leadership
 -

 Key: ZOOKEEPER-790
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.1
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-3.3.patch, 
 ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, 
 ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2


 The leader code is setting the last processed zxid to the first of the new 
 epoch even before connecting to a quorum of followers. Because the leader 
 code sets this value before connecting to a quorum of followers 
 (Leader.java:281) and the follower code throws an IOException 
 (Follower.java:73) if the leader epoch is smaller, we have that when the 
 false leader drops leadership and becomes a follower, it finds a smaller 
 epoch and kills itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership

2010-07-22 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891210#action_12891210
 ] 

Benjamin Reed commented on ZOOKEEPER-790:
-

looks great flavio! the only nit i have is that the test case assumes that s1 
is not the leader. you might want to check that.

 Last processed zxid set prematurely while establishing leadership
 -

 Key: ZOOKEEPER-790
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.1
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790.patch, 
 ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, 
 ZOOKEEPER-790.travis.log.bz2


 The leader code is setting the last processed zxid to the first of the new 
 epoch even before connecting to a quorum of followers. Because the leader 
 code sets this value before connecting to a quorum of followers 
 (Leader.java:281) and the follower code throws an IOException 
 (Follower.java:73) if the leader epoch is smaller, we have that when the 
 false leader drops leadership and becomes a follower, it finds a smaller 
 epoch and kills itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership

2010-07-22 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-790:


Hadoop Flags: [Reviewed]

+1 great job flavio! thanx for your help travis and vishal.

 Last processed zxid set prematurely while establishing leadership
 -

 Key: ZOOKEEPER-790
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.1
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-3.3.patch, 
 ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, 
 ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2


 The leader code is setting the last processed zxid to the first of the new 
 epoch even before connecting to a quorum of followers. Because the leader 
 code sets this value before connecting to a quorum of followers 
 (Leader.java:281) and the follower code throws an IOException 
 (Follower.java:73) if the leader epoch is smaller, we have that when the 
 false leader drops leadership and becomes a follower, it finds a smaller 
 epoch and kills itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-806) Cluster management with Zookeeper - Norbert


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886864#action_12886864
 ] 

Benjamin Reed commented on ZOOKEEPER-806:
-

this looks really cool. is there a collaboration model you were thinking of? 
(btw, have you guys thought of presenting this at the hadoop summit or similar 
venue?)

 Cluster management with Zookeeper - Norbert
 ---

 Key: ZOOKEEPER-806
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-806
 Project: Zookeeper
  Issue Type: New Feature
Reporter: John Wang

 Hello, we have built a cluster management layer on top of Zookeeper here at 
 the SNA team at LinkedIn: 
 http://sna-projects.com/norbert/
 We were wondering ways for collaboration as this is a very useful application 
 of zookeeper.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (ZOOKEEPER-807) bookkeeper does not put enough meta-data in to do recovery properly

bookkeeper does not put enough meta-data in to do recovery properly
---

 Key: ZOOKEEPER-807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-807
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Reporter: Benjamin Reed


somewhere, probably zookeeper, we need to keep track of the the information 
about keys used for access and for mac validation as well as the digest type 
for entries. we can't write a general recovery tool without it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-712) Bookie recovery


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-712:


Hadoop Flags: [Reviewed]

+1 looks good. thanx erwin!

 Bookie recovery
 ---

 Key: ZOOKEEPER-712
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-712
 Project: Zookeeper
  Issue Type: New Feature
  Components: contrib-bookkeeper
Reporter: Flavio Paiva Junqueira
Assignee: Erwin Tam
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-712.patch


 Recover the ledger fragments of a bookie once it crashes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-719) Add throttling to BookKeeper client


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-719:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed revision 962693.

 Add throttling to BookKeeper client
 ---

 Key: ZOOKEEPER-719
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-719
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Affects Versions: 3.3.0
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-719.patch, ZOOKEEPER-719.patch, 
 ZOOKEEPER-719.patch, ZOOKEEPER-719.patch


 Add throttling to client to control the rate of operations to bookies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-712) Bookie recovery


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-712:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed revision 962697.


 Bookie recovery
 ---

 Key: ZOOKEEPER-712
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-712
 Project: Zookeeper
  Issue Type: New Feature
  Components: contrib-bookkeeper
Reporter: Flavio Paiva Junqueira
Assignee: Erwin Tam
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-712.patch


 Recover the ledger fragments of a bookie once it crashes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-794:


Status: Open  (was: Patch Available)

-1 we need to get a test in. also the fix has a race condition. the boolean 
flag may changed after it is checked and before the request is queued.

 Callbacks are not invoked when the client is closed
 ---

 Key: ZOOKEEPER-794
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.3.1
Reporter: Alexis Midon
Assignee: Alexis Midon
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt


 I noticed that ZooKeeper has different behaviors when calling synchronous or 
 asynchronous actions on a closed ZooKeeper client.
 Actually a synchronous call will throw a session expired exception while an 
 asynchronous call will do nothing. No exception, no callback invocation.
 Actually, even if the EventThread receives the Packet with the session 
 expired err code, the packet is never processed since the thread has been 
 killed by the ventOfDeath. So the call back is not invoked.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-794:


Status: Patch Available  (was: Open)

 Callbacks are not invoked when the client is closed
 ---

 Key: ZOOKEEPER-794
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.3.1
Reporter: Alexis Midon
Assignee: Alexis Midon
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, 
 ZOOKEEPER-794_2.patch


 I noticed that ZooKeeper has different behaviors when calling synchronous or 
 asynchronous actions on a closed ZooKeeper client.
 Actually a synchronous call will throw a session expired exception while an 
 asynchronous call will do nothing. No exception, no callback invocation.
 Actually, even if the EventThread receives the Packet with the session 
 expired err code, the packet is never processed since the thread has been 
 killed by the ventOfDeath. So the call back is not invoked.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed