[jira] Commented: (ZOOKEEPER-849) Provide Path class
[ https://issues.apache.org/jira/browse/ZOOKEEPER-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934929#action_12934929 ] Benjamin Reed commented on ZOOKEEPER-849: - how do i see the patches? Provide Path class -- Key: ZOOKEEPER-849 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-849 Project: Zookeeper Issue Type: Sub-task Components: java client Reporter: Thomas Koch Assignee: Thomas Koch Fix For: 3.4.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-836) hostlist as string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934946#action_12934946 ] Benjamin Reed commented on ZOOKEEPER-836: - it seems like overkill to have a class to just parse a hostlist. wouldn't you want put that parsing in the class that actually manages the list? we should not be passing around a list of resolved addresses, since those addresses and the list themselves can change. (this is what i mentioned earlier.) instead hostset should take care of resolving and managing the list of resolved addresses. i guess we can do that as a separate patch. hostlist as string -- Key: ZOOKEEPER-836 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836 Project: Zookeeper Issue Type: Sub-task Components: java client Affects Versions: 3.3.1 Reporter: Patrick Datko Assignee: Thomas Koch Attachments: ZOOKEEPER-836.patch The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of not doing (too much) work in a ctor. Instead the ClientCnxn should receive an object of class HostSet. HostSet could then be instantiated e.g. with a comma separated string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-836) hostlist as string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934718#action_12934718 ] Benjamin Reed commented on ZOOKEEPER-836: - why don't we at least call it HostSet so that we don't have to change the name later? hostlist as string -- Key: ZOOKEEPER-836 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836 Project: Zookeeper Issue Type: Sub-task Components: java client Affects Versions: 3.3.1 Reporter: Patrick Datko Assignee: Thomas Koch Attachments: ZOOKEEPER-836.patch The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of not doing (too much) work in a ctor. Instead the ClientCnxn should receive an object of class HostSet. HostSet could then be instantiated e.g. with a comma separated string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933557#action_12933557 ] Benjamin Reed commented on ZOOKEEPER-922: - camille, i also think disabling moving sessions is not a good idea or very useful, but it seems to be the only way to have sensible semantics. may i suggest that we take this discussion a bit higher? i think there are fundamental assumptions that you are making that i'm questioning. can you write up a high-level design and state your assumptions? i can't quite see how the math works out between the client-server timeouts, connect timeouts, and lower session timeout. i'm also not clear on how much you are relying on a connection reset for the failure detection. enable faster timeout of sessions in case of unexpected socket disconnect - Key: ZOOKEEPER-922 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.4.0 Attachments: ZOOKEEPER-922.patch In the case when a client connection is closed due to socket error instead of the client calling close explicitly, it would be nice to enable the session associated with that client to time out faster than the negotiated session timeout. This would enable a zookeeper ensemble that is acting as a dynamic discovery provider to remove ephemeral nodes for crashed clients quickly, while allowing for a longer heartbeat-based timeout for java clients that need to do long stop-the-world GC. I propose doing this by setting the timeout associated with the crashed session to minSessionTimeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933072#action_12933072 ] Benjamin Reed commented on ZOOKEEPER-925: - yeah i tried doxia converter with various different formats and strategies. the problem with db2rst is, even if i get it to rst, how do i get it to confluence? i looked at the search/replace, but it turns out that we do use quite a bit of tags that are a bit complicated, so there isn't an easy way to do it. perhaps it would be easy with xsl, but i don't know xsl. Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt Assignee: Patrick Hunt Attachments: ZOOKEEPER-925.patch See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-930) Hedwig c++ client uses a non thread safe logging library
[ https://issues.apache.org/jira/browse/ZOOKEEPER-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-930: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed revision 1035727. Hedwig c++ client uses a non thread safe logging library Key: ZOOKEEPER-930 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-930 Project: Zookeeper Issue Type: Bug Components: contrib-hedwig Affects Versions: 3.3.2 Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: ZOOKEEPER-930.patch, ZOOKEEPER-930.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-930) Hedwig c++ client uses a non thread safe logging library
[ https://issues.apache.org/jira/browse/ZOOKEEPER-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932583#action_12932583 ] Benjamin Reed commented on ZOOKEEPER-930: - thanx ivan! Hedwig c++ client uses a non thread safe logging library Key: ZOOKEEPER-930 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-930 Project: Zookeeper Issue Type: Bug Components: contrib-hedwig Affects Versions: 3.3.2 Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: ZOOKEEPER-930.patch, ZOOKEEPER-930.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932592#action_12932592 ] Benjamin Reed commented on ZOOKEEPER-925: - +1 for confluence it would be great to target 1) for when we move to tlp. Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt Assignee: Patrick Hunt Attachments: ZOOKEEPER-925.patch See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932639#action_12932639 ] Benjamin Reed commented on ZOOKEEPER-922: - if we had a foolproof way to tell that a client is down, we could do this fast expire. the methods you are proposing are not foolproof and will lead to problems exactly when you most want them not to. the timeout interactions you are talking about are problematic. it's really hard to get them right. one way that i can see this working is to not allow clients to reconnect to other servers. in that can a socket reset would indicate an expired session. is this acceptable to you? enable faster timeout of sessions in case of unexpected socket disconnect - Key: ZOOKEEPER-922 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.4.0 Attachments: ZOOKEEPER-922.patch In the case when a client connection is closed due to socket error instead of the client calling close explicitly, it would be nice to enable the session associated with that client to time out faster than the negotiated session timeout. This would enable a zookeeper ensemble that is acting as a dynamic discovery provider to remove ephemeral nodes for crashed clients quickly, while allowing for a longer heartbeat-based timeout for java clients that need to do long stop-the-world GC. I propose doing this by setting the timeout associated with the crashed session to minSessionTimeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932799#action_12932799 ] Benjamin Reed commented on ZOOKEEPER-925: - i cannot figure out how to convert forrest to anything. actually, i can't figure out how we have forrest working at all! after burning the afternoon trying to figure out how to convert forrest to confluence, i'm officially declaring defeat. it should be an easy thing to do for an xml/xsl master, but that is not me. the most promising thing appears to be the doxia converter that will go from a bunch of formats to a bunch more formats, including from docbook or xdoc to confluence. unfortunately, forrest seems close to both of those, but not close enough... Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt Assignee: Patrick Hunt Attachments: ZOOKEEPER-925.patch See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932809#action_12932809 ] Benjamin Reed commented on ZOOKEEPER-366: - i haven't had a chance to get back to this. we really need to convert all the currentTimeMillis() to nanoTime(). we need to do a similar change in the C client. i don't think we can do a test for this. Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Components: quorum, server Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.3.3, 3.4.0 Attachments: ZOOKEEPER-366.patch the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-930) Hedwig c++ client uses a non thread safe logging library
[ https://issues.apache.org/jira/browse/ZOOKEEPER-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932163#action_12932163 ] Benjamin Reed commented on ZOOKEEPER-930: - looks good ivan. you should probably mention that you are moving to log4cxx for thread safety issues. the one minor thing: you messed up the indentation on a couple of lines. can you fix those? Hedwig c++ client uses a non thread safe logging library Key: ZOOKEEPER-930 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-930 Project: Zookeeper Issue Type: Bug Components: contrib-hedwig Affects Versions: 3.3.2 Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: ZOOKEEPER-930.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-909: Hadoop Flags: [Reviewed] +1 looks good thomas! thanx! Extract NIO specific code from ClientCnxn - Key: ZOOKEEPER-909 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909 Project: Zookeeper Issue Type: Sub-task Components: java client Reporter: Thomas Koch Assignee: Thomas Koch Fix For: 3.4.0 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus everything Netty related. This means this patch only extract all NIO specific code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket. I've redone this patch from current trunk step by step now and couldn't find any logical error. I've already done a couple of successful test runs and will continue to do so this night. It would be nice, if we could apply this patch as soon as possible to trunk. This allows us to continue to work on the netty integration without blocking the ClientCnxn class. Adding Netty after this patch should be only a matter of adding the ClientCnxnSocketNetty class with the appropriate test cases. You could help me by reviewing the patch and by running it on whatever test server you have available. Please send me any complete failure log you should encounter to thomas at koch point ro. Thx! Update: Until now, I've collected 8 successful builds in a row! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-922: Status: Open (was: Patch Available) the problem with your corner case is that you can end up with a leader who thinks it is still the leader, but zookeeper thinks the leader is dead and allows another leader to take over. there may be a way to do this reliably, but we need to vet the design first. enable faster timeout of sessions in case of unexpected socket disconnect - Key: ZOOKEEPER-922 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.4.0 Attachments: ZOOKEEPER-922.patch In the case when a client connection is closed due to socket error instead of the client calling close explicitly, it would be nice to enable the session associated with that client to time out faster than the negotiated session timeout. This would enable a zookeeper ensemble that is acting as a dynamic discovery provider to remove ephemeral nodes for crashed clients quickly, while allowing for a longer heartbeat-based timeout for java clients that need to do long stop-the-world GC. I propose doing this by setting the timeout associated with the crashed session to minSessionTimeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: What happens to a follower if leader hangs?
have you been able to make this happen? the behavior you are suggesting is exactly what should be happening. When we sync with the leader we set the socket timeout: sock.setSoTimeout(self.tickTime * self.syncLimit); if the leader hangs, we should get a timeout and disconnect from the leader. ben On 11/10/2010 11:57 AM, Vishal Kher wrote: Yes, thats what I was planning to do. At the follower, start FLE if the follower does not receive a ping for (syncLimit * tickTime). On Wed, Nov 10, 2010 at 2:48 PM, Mahadev Konarmaha...@yahoo-inc.comwrote: Hi Vishal, There are periodic pings sent from the leader to the followers. Take a look at Leader.java: syncedSet.add(self.getId()); synchronized (learners) { for (LearnerHandler f : learners) { if (f.synced()) { syncedCount++; syncedSet.add(f.getSid()); } f.ping(); } } This code sends periodic pings to the followers to make sure they are running fine. We should keep track of these pings and see if we havent seen a ping packet from the leader for a long time and give up following the leader in case we havent heard from him for a long time. This is definitely worth fixing since we pride ourselves in being a highly available and reliable service. Please feel free to open a jira and work on it. 3.4 would be a good target for this. Thanks mahadev On 11/10/10 12:26 PM, Vishal Khervishalm...@gmail.com wrote: Hi, In Follower.followLeader() after syncing with the leader, the follower does: while (self.isRunning()) { readPacket(qp); processPacket(qp); } It looks like it relies on socket timeout expiry to figure out if the connection with the leader has gone down. So a follower *with no cilents* may never notice a faulty leader if a Leader has a software hang, but the TCP connections with the peers are still valid. Since it has not cilents, it won't hearbeat with the Leader. If majority of followers are not connected to any clients, then even if other followers attempt to elect a new leader after detecting that the leader is unresponsive. Please correct me if I am wrong. If I am not mistaken, should we add code at the follower to monitor the heartbeat messages that it receives from the leader and take action if it misses heartbeats for time (syncLimit * tickTime)? This certainly is a hypothetical case, however, I think it is worth a fix. Thanks. -Vishal
[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930205#action_12930205 ] Benjamin Reed commented on ZOOKEEPER-925: - i'm totally interested in moving to maven site! i really really want to get away from forrest and make it a bit easier to write doc. can we also get away from checking in generated doc? Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930221#action_12930221 ] Benjamin Reed commented on ZOOKEEPER-925: - just to be clear. we should check in the source for the docs. i'm just saying that we check only check in the source for the docs, not the generated pdfs and web pages. Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930517#action_12930517 ] Benjamin Reed commented on ZOOKEEPER-925: - this is pretty cool! we can generate pdfs by using doxia converter to go from confluence to latex. Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt Assignee: Patrick Hunt Attachments: ZOOKEEPER-925.patch See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930526#action_12930526 ] Benjamin Reed commented on ZOOKEEPER-925: - since maven generates the doc without requiring preinstalled tools. i don't think it is onerous at all to just check in the sources and require users to compile the doc if they are using trunk. Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt Assignee: Patrick Hunt Attachments: ZOOKEEPER-925.patch See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929683#action_12929683 ] Benjamin Reed commented on ZOOKEEPER-922: - how do you deal with the following race condition: 1) the client is connected to follower1 2) the client has problems talking to follower1, so it closes the connection 3) the client connects to follower2 4) follower1 detects the closed connection and sets the connection timeout to min 5) the client is idle for min timeout and the leader expires the connection the race condition is steps 3) and 4). if follower1 doesn't detect the dead connection fast enough, it can improperly set the timeout. enable faster timeout of sessions in case of unexpected socket disconnect - Key: ZOOKEEPER-922 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.4.0 Attachments: ZOOKEEPER-922.patch In the case when a client connection is closed due to socket error instead of the client calling close explicitly, it would be nice to enable the session associated with that client to time out faster than the negotiated session timeout. This would enable a zookeeper ensemble that is acting as a dynamic discovery provider to remove ephemeral nodes for crashed clients quickly, while allowing for a longer heartbeat-based timeout for java clients that need to do long stop-the-world GC. I propose doing this by setting the timeout associated with the crashed session to minSessionTimeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-862) Hedwig created ledgers with hardcoded Bookkeeper ensemble and quorum size. Make these a server config parameter instead.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-862: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1 looks good thanx Erwin it looks like this was accidentally committed in r1031051 Hedwig created ledgers with hardcoded Bookkeeper ensemble and quorum size. Make these a server config parameter instead. - Key: ZOOKEEPER-862 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-862 Project: Zookeeper Issue Type: Improvement Components: contrib-hedwig Reporter: Erwin Tam Assignee: Erwin Tam Fix For: 3.4.0 Attachments: ZOOKEEPER-862.patch Hedwig code right now when using Bookkeeper as the persistence store is hardcoding the number of bookie servers in the ensemble and quorum size. This is used the first time a ledger is created. This should be exposed as a server configuration parameter instead. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-916) Problem receiving messages from subscribed channels in c++ client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-916: Resolution: Fixed Status: Resolved (was: Patch Available) Committed revision 1031453. Problem receiving messages from subscribed channels in c++ client -- Key: ZOOKEEPER-916 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-916 Project: Zookeeper Issue Type: Bug Components: contrib-hedwig Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: ZOOKEEPER-916.patch We see this bug with receiving messages from a subscribed channel. This problem seems to happen with larger messages. The flow is to first read at least 4 bytes from the socket channel. Extract the first 4 bytes to get the message size. If we've read enough data into the buffer already, we're done so invoke the messageReadCallbackHandler passing the channel and message size. If not, then do an async read for at least the remaining amount of bytes in the message from the socket channel. When done, invoke the messageReadCallbackHandler. The problem seems that when the second async read is done, the same sizeReadCallbackHandler is invoked instead of the messageReadCallbackHandler. The result is that we then try to read the first 4 bytes again from the buffer. This will get a random message size and screw things up. I'm not sure if it's an incorrect use of the boost asio async_read function or we're doing the boost bind to the callback function incorrectly. 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler system:0,512 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of buffer before reading message size: 512 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of incoming message 599, currently in buffer 508 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: Still have more data to read, 91 from channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler system:0, 91 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of buffer before reading message size: 599 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of incoming message 134287360, currently in buffer 595 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: Still have more data to read, 134286765 from channel(0x80b7a18) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-916) Problem receiving messages from subscribed channels in c++ client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-916: Hadoop Flags: [Reviewed] +1 thanx for the fix ivan! Problem receiving messages from subscribed channels in c++ client -- Key: ZOOKEEPER-916 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-916 Project: Zookeeper Issue Type: Bug Components: contrib-hedwig Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: ZOOKEEPER-916.patch We see this bug with receiving messages from a subscribed channel. This problem seems to happen with larger messages. The flow is to first read at least 4 bytes from the socket channel. Extract the first 4 bytes to get the message size. If we've read enough data into the buffer already, we're done so invoke the messageReadCallbackHandler passing the channel and message size. If not, then do an async read for at least the remaining amount of bytes in the message from the socket channel. When done, invoke the messageReadCallbackHandler. The problem seems that when the second async read is done, the same sizeReadCallbackHandler is invoked instead of the messageReadCallbackHandler. The result is that we then try to read the first 4 bytes again from the buffer. This will get a random message size and screw things up. I'm not sure if it's an incorrect use of the boost asio async_read function or we're doing the boost bind to the callback function incorrectly. 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler system:0,512 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of buffer before reading message size: 512 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of incoming message 599, currently in buffer 508 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: Still have more data to read, 91 from channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler system:0, 91 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of buffer before reading message size: 599 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of incoming message 134287360, currently in buffer 595 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: Still have more data to read, 134286765 from channel(0x80b7a18) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-909: Status: Open (was: Patch Available) once a couple of small changes are made to this patch, we should be good to go. Extract NIO specific code from ClientCnxn - Key: ZOOKEEPER-909 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909 Project: Zookeeper Issue Type: Sub-task Components: java client Reporter: Thomas Koch Assignee: Thomas Koch Fix For: 3.4.0 Attachments: ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus everything Netty related. This means this patch only extract all NIO specific code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket. I've redone this patch from current trunk step by step now and couldn't find any logical error. I've already done a couple of successful test runs and will continue to do so this night. It would be nice, if we could apply this patch as soon as possible to trunk. This allows us to continue to work on the netty integration without blocking the ClientCnxn class. Adding Netty after this patch should be only a matter of adding the ClientCnxnSocketNetty class with the appropriate test cases. You could help me by reviewing the patch and by running it on whatever test server you have available. Please send me any complete failure log you should encounter to thomas at koch point ro. Thx! Update: Until now, I've collected 8 successful builds in a row! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages
[ https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed resolved ZOOKEEPER-907. - Resolution: Fixed Committed revision 1031051. Committed revision 1031064. Spurious KeeperErrorCode = Session moved messages --- Key: ZOOKEEPER-907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-907.patch, ZOOKEEPER-907.patch_v2 The sync request does not set the session owner in Request. As a result, the leader keeps printing: 2010-07-01 10:55:36,733 - INFO [ProcessThread:-1:preprequestproces...@405] - Got user-level KeeperException when processing sessionid:0x298d3b1fa9 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error Path:null Error:KeeperErrorCode = Session moved -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-884) Remove LedgerSequence references from BookKeeper documentation and comments in tests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-884: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1 thanx flavio Committed revision 1031433. Remove LedgerSequence references from BookKeeper documentation and comments in tests - Key: ZOOKEEPER-884 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-884 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Fix For: 3.4.0 Attachments: ZOOKEEPER-884.patch We no longer use LedgerSequence, so we need to remove references in documentation and comments sprinkled throughout the code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages
[ https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-907: Hadoop Flags: [Reviewed] Spurious KeeperErrorCode = Session moved messages --- Key: ZOOKEEPER-907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-907.patch, ZOOKEEPER-907.patch_v2 The sync request does not set the session owner in Request. As a result, the leader keeps printing: 2010-07-01 10:55:36,733 - INFO [ProcessThread:-1:preprequestproces...@405] - Got user-level KeeperException when processing sessionid:0x298d3b1fa9 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error Path:null Error:KeeperErrorCode = Session moved -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927045#action_12927045 ] Benjamin Reed commented on ZOOKEEPER-909: - the patch looks good. are you proposing that we commit it? or are you still working on it? i don't mind pushing off the javadoc for a bit if you think things might change. (although it would be nice to get that class more firmed up before we commit really...) we should get the property doc in before we commit since that will not change. One other nit, if you are willing: calling the ClientCxnSocket socket and using getSocket is a bit confusing since ClientCnxnSocket does not extend socket. It's a bit more verbose, but more clear if you call the member and method clientCxnSocket and getClientCnxnSocket. Extract NIO specific code from ClientCnxn - Key: ZOOKEEPER-909 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909 Project: Zookeeper Issue Type: Sub-task Components: java client Reporter: Thomas Koch Assignee: Thomas Koch Fix For: 3.4.0 Attachments: ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus everything Netty related. This means this patch only extract all NIO specific code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket. I've redone this patch from current trunk step by step now and couldn't find any logical error. I've already done a couple of successful test runs and will continue to do so this night. It would be nice, if we could apply this patch as soon as possible to trunk. This allows us to continue to work on the netty integration without blocking the ClientCnxn class. Adding Netty after this patch should be only a matter of adding the ClientCnxnSocketNetty class with the appropriate test cases. You could help me by reviewing the patch and by running it on whatever test server you have available. Please send me any complete failure log you should encounter to thomas at koch point ro. Thx! Update: Until now, I've collected 8 successful builds in a row! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages
[ https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926404#action_12926404 ] Benjamin Reed commented on ZOOKEEPER-907: - may i propose accepting this patch without a test case? (we can see that it fixes the problem.) that way we can get 3.3.2 out. once ZOOKEEPER-915 goes it the tests should cover this issue. Spurious KeeperErrorCode = Session moved messages --- Key: ZOOKEEPER-907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-907.patch, ZOOKEEPER-907.patch_v2 The sync request does not set the session owner in Request. As a result, the leader keeps printing: 2010-07-01 10:55:36,733 - INFO [ProcessThread:-1:preprequestproces...@405] - Got user-level KeeperException when processing sessionid:0x298d3b1fa9 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error Path:null Error:KeeperErrorCode = Session moved -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages
[ https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925976#action_12925976 ] Benjamin Reed commented on ZOOKEEPER-907: - Ah, I see the problem. There are actually two problems: 1) when sync() get's an error it is not propagated back to the caller. 2) this problem. They problem is that 1) is preventing us from writing a test case. We need to fix 1) and then we can write the test for 2). Spurious KeeperErrorCode = Session moved messages --- Key: ZOOKEEPER-907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-907.patch, ZOOKEEPER-907.patch_v2 The sync request does not set the session owner in Request. As a result, the leader keeps printing: 2010-07-01 10:55:36,733 - INFO [ProcessThread:-1:preprequestproces...@405] - Got user-level KeeperException when processing sessionid:0x298d3b1fa9 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error Path:null Error:KeeperErrorCode = Session moved -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-915) Errors that happen during sync() processing at the leader do not get propagated back to the client.
Errors that happen during sync() processing at the leader do not get propagated back to the client. --- Key: ZOOKEEPER-915 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-915 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed If an error in sync() processing happens at the leader (SESSION_MOVED for example), they are not propagated back to the client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages
[ https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925540#action_12925540 ] Benjamin Reed commented on ZOOKEEPER-907: - ah got it. ok i was able to reproduce it: the client connects to the follower, issues a sync, the error message shows up in the log of the leader. so there is an additional bug here -- why is the client not getting the session moved error. Spurious KeeperErrorCode = Session moved messages --- Key: ZOOKEEPER-907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-907.patch, ZOOKEEPER-907.patch_v2 The sync request does not set the session owner in Request. As a result, the leader keeps printing: 2010-07-01 10:55:36,733 - INFO [ProcessThread:-1:preprequestproces...@405] - Got user-level KeeperException when processing sessionid:0x298d3b1fa9 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error Path:null Error:KeeperErrorCode = Session moved -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [VOTE] ZooKeeper as TLP?
+1 On 10/22/2010 02:42 PM, Patrick Hunt wrote: Please vote as to whether you think ZooKeeper should become a top-level Apache project, as discussed previously on this list. I've included below a draft board resolution. Do folks support sending this request on to the Hadoop PMC? Patrick X. Establish the Apache ZooKeeper Project WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software related to distributed system coordination for distribution at no charge to the public. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the Apache ZooKeeper Project, be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache ZooKeeper Project be and hereby is responsible for the creation and maintenance of software related to distributed system coordination; and be it further RESOLVED, that the office of Vice President, Apache ZooKeeper be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache ZooKeeper Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache ZooKeeper Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache ZooKeeper Project: * Patrick Huntph...@apache.org * Flavio Junqueiraf...@apache.org * Mahadev Konarmaha...@apache.org * Benjamin Reedbr...@apache.org * Henry Robinsonhe...@apache.org NOW, THEREFORE, BE IT FURTHER RESOLVED, that Patrick Hunt be appointed to the office of Vice President, Apache ZooKeeper, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the initial Apache ZooKeeper PMC be and hereby is tasked with the creation of a set of bylaws intended to encourage open development and increased participation in the Apache ZooKeeper Project; and be it further RESOLVED, that the Apache ZooKeeper Project be and hereby is tasked with the migration and rationalization of the Apache Hadoop ZooKeeper sub-project; and be it further RESOLVED, that all responsibilities pertaining to the Apache Hadoop ZooKeeper sub-project encumbered upon the Apache Hadoop Project are hereafter discharged.
[jira] Commented: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages
[ https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923895#action_12923895 ] Benjamin Reed commented on ZOOKEEPER-907: - sync doesn't cause any additional traffic over the atomic broadcast. it just makes sure that the all of the in-process transactions have be sent to the follower. when that error happens, the error will be sent back to the follower ordered after all of the completed transactions. so rather than being able to see the result of all requests initiated before the sync, the follower will see all requests completed before the sync. that is why i referred to it as a partial sync. i'm really having problems trying to reproduce this error. can you describe more how it happened? i would like to have an end-to-end test rather than the test of a particular implementation so that this error doesn't pop up if the implementation changes. looking at the code it seems like it should happen everytime the sync request is sent to a follower, but that doesn't seem to be the case. Spurious KeeperErrorCode = Session moved messages --- Key: ZOOKEEPER-907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-907.patch The sync request does not set the session owner in Request. As a result, the leader keeps printing: 2010-07-01 10:55:36,733 - INFO [ProcessThread:-1:preprequestproces...@405] - Got user-level KeeperException when processing sessionid:0x298d3b1fa9 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error Path:null Error:KeeperErrorCode = Session moved -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923905#action_12923905 ] Benjamin Reed commented on ZOOKEEPER-909: - this is looking really nice. i'm not done reviewing, but i did want to note that you need to add the zookeeper.clientCxnSocket property to the doc. You should also javadoc that variable. Extract NIO specific code from ClientCnxn - Key: ZOOKEEPER-909 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909 Project: Zookeeper Issue Type: Sub-task Components: java client Reporter: Thomas Koch Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus everything Netty related. This means this patch only extract all NIO specific code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket. I've redone this patch from current trunk step by step now and couldn't find any logical error. I've already done a couple of successful test runs and will continue to do so this night. It would be nice, if we could apply this patch as soon as possible to trunk. This allows us to continue to work on the netty integration without blocking the ClientCnxn class. Adding Netty after this patch should be only a matter of adding the ClientCnxnSocketNetty class with the appropriate test cases. You could help me by reviewing the patch and by running it on whatever test server you have available. Please send me any complete failure log you should encounter to thomas at koch point ro. Thx! Update: Until now, I've collected 8 successful builds in a row! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Restarting discussion on ZooKeeper as a TLP
i think we want to be responsible for the creation and maintenance of software related to distributed system coordination. ben On 10/21/2010 01:43 PM, Mahadev Konar wrote: NOW, THEREFORE, BE IT FURTHER RESOLVED, that Matt Massie be appointed to the office of Vice President, Apache ZooKeeper, to I think you meant Patrick Hunt ? :) Other than that it looks good. Thanks mahadev On 10/21/10 1:28 PM, Patrick Hunt ph...@apache.org wrote: Ack, I missed Henry in the list, sorry! In my defense I copied this: http://hadoop.apache.org/zookeeper/credits.html one more try (same as before except for adding henry to the pmc): X. Establish the Apache ZooKeeper Project WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software related to data serialization for distribution at no charge to the public. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the Apache ZooKeeper Project, be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache ZooKeeper Project be and hereby is responsible for the creation and maintenance of software related to data serialization; and be it further RESOLVED, that the office of Vice President, Apache ZooKeeper be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache ZooKeeper Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache ZooKeeper Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache ZooKeeper Project: * Patrick Hunt ph...@apache.org * Flavio Junqueira f...@apache.org * Mahadev Konarmaha...@apache.org * Benjamin Reedbr...@apache.org * Henry Robinson he...@apache.org NOW, THEREFORE, BE IT FURTHER RESOLVED, that Matt Massie be appointed to the office of Vice President, Apache ZooKeeper, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the initial Apache ZooKeeper PMC be and hereby is tasked with the creation of a set of bylaws intended to encourage open development and increased participation in the Apache ZooKeeper Project; and be it further RESOLVED, that the Apache ZooKeeper Project be and hereby is tasked with the migration and rationalization of the Apache Hadoop ZooKeeper sub-project; and be it further RESOLVED, that all responsibilities pertaining to the Apache Hadoop ZooKeeper sub-project encumbered upon the Apache Hadoop Project are hereafter discharged. On Thu, Oct 21, 2010 at 10:44 AM, Henry Robinson he...@cloudera.com wrote: Looks good, please do call a vote. On 21 October 2010 09:29, Patrick Hunt ph...@apache.org wrote: Here's a draft board resolution (not a vote, just discussion). It lists all current committers (except as noted in the next paragraph) as the initial members of the project management committee (PMC) and myself as the initial chair. Notice that I have left Andrew off the PMC as he has not been active with the project for over two years. I believe we should continue to include him on the committer roles subsequent to moving to tlp, however as he has not been an active member of the community for such a long period we would not include him on the PMC at this time. If others feel differently let me know, I'm willing to include him if the people feel differently. LMK if this looks good to you and I'll call for an official vote on this list (then we'll be ready to call a vote on the hadoop pmc). Regards, Patrick X. Establish the Apache ZooKeeper Project WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software related to data serialization for distribution at no charge to the public. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the Apache ZooKeeper Project
Re: What's the magic behind lenBuffer and incomingBuffer?
look in readLength(). incomingBuffer is set to a newly allocated ByteBuffer. ben On 10/21/2010 07:52 AM, Thomas Koch wrote: Hi, inside ClientCnxn.SendThread we have final ByteBuffer lenBuffer = ByteBuffer.allocateDirect(4); ByteBuffer incomingBuffer = lenBuffer; So incomingBuffer and lenBuffer do refer to the same object. There are several other places where lenBuffer is again assigned to incomingBuffer. Now inside the doIO() method we got if (incomingBuffer == lenBuffer) { recvCount++; readLength(); } else if (!initialized) { incomingBuffer is never assigned anything else then lenBuffer, lenBuffer stays the same all the time. So as far as my knowledge of java reaches (which may not be too far) incomingBuffer == lenBuffer _always_ evaluates to true. Isn't that true? So effectively we've got dead code in the elseif and else branches, didn't we? Best regards, Thomas Koch, http://www.koch.ro
[jira] Commented: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages
[ https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923200#action_12923200 ] Benjamin Reed commented on ZOOKEEPER-907: - yes, this will fail the sync. it will not get passed through the pipeline. it will give you a partial sync though :) Spurious KeeperErrorCode = Session moved messages --- Key: ZOOKEEPER-907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-907.patch The sync request does not set the session owner in Request. As a result, the leader keeps printing: 2010-07-01 10:55:36,733 - INFO [ProcessThread:-1:preprequestproces...@405] - Got user-level KeeperException when processing sessionid:0x298d3b1fa9 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error Path:null Error:KeeperErrorCode = Session moved -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-835) Refactoring Zookeeper Client Code
[ https://issues.apache.org/jira/browse/ZOOKEEPER-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922813#action_12922813 ] Benjamin Reed commented on ZOOKEEPER-835: - how do you see any of these things as related to ZOOKEEPER-22? Refactoring Zookeeper Client Code - Key: ZOOKEEPER-835 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-835 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.3.1 Reporter: Patrick Datko Assignee: Thomas Koch Thomas Koch asked me to fill individual issues for the points raised in his mail to zookeeper-dev: [Mail of Thomas Koch| http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3c20100845.17507.tho...@koch.ro%3e ] He published several issues, which are present in the current zookeeper client, so a refactoring of the code would be an facility for other developers working with zookeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load
[ https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921412#action_12921412 ] Benjamin Reed commented on ZOOKEEPER-885: - we are having problems reproducing this. can you give a bit more details on the machines you are using? what are the cpu and memory size? also, what is the throughput of dd if=/dev/zero of=/dev/mapper/nimbula-test? is there just one disk, where nimbula-test is a partition on that disk and you have another partition for the snapshots and logs? even if you don't have swap space, code pages can be discarded and loaded on demand, so that could be a potential problem. what does /proc/meminfo look like? Zookeeper drops connections under moderate IO load -- Key: ZOOKEEPER-885 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.2, 3.3.1 Environment: Debian (Lenny) 1Gb RAM swap disabled 100Mb heap for zookeeper Reporter: Alexandre Hardy Priority: Critical Attachments: benchmark.csv, tracezklogs.tar.gz, tracezklogs.tar.gz, WatcherTest.java, zklogs.tar.gz A zookeeper server under minimum load, with a number of clients watching exactly one node will fail to maintain the connection when the machine is subjected to moderate IO load. In a specific test example we had three zookeeper servers running on dedicated machines with 45 clients connected, watching exactly one node. The clients would disconnect after moderate load was added to each of the zookeeper servers with the command: {noformat} dd if=/dev/urandom of=/dev/mapper/nimbula-test {noformat} The {{dd}} command transferred data at a rate of about 4Mb/s. The same thing happens with {noformat} dd if=/dev/zero of=/dev/mapper/nimbula-test {noformat} It seems strange that such a moderate load should cause instability in the connection. Very few other processes were running, the machines were setup to test the connection instability we have experienced. Clients performed no other read or mutation operations. Although the documents state that minimal competing IO load should present on the zookeeper server, it seems reasonable that moderate IO should not cause problems in this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: What's the QA strategy of ZooKeeper?
i think we have a very different perspective on the quality issue: I didn't want to say it that clear, but especially the new Netty code, both on client and server side is IMHO an example of new code in very bad shape. The client code patch even changes the FindBugs configuration to exclude the new code from the FindBugs checks. great. fixing the code and refactoring before a patch goes in is the perfect time to do it! please give feedback and help make the patch better. there is a reason to exclude checks (which is why there is such excludes), but if we can avoid them we should. before a patch is applied is exactly the time to do cleanup If your code is already in such a bad shape, that every change includes considerable risk to break something, then you already are in trouble. With every new feature (or bugfix!) you also risk to break something. If you don't have the attitude of permanent refactoring to improve the code quality, you will inevitably lower the maintainability of your code with every new feature. New features will build on the dirty concepts already in the code and therfor make it more expensive to ever clean things up. cleaning up code to add a new feature is a great time to clean up the code. Yes. Refactoring isn't easy, but necessary. Only over time you better understand your domain and find better structures. Over time you introduce features that let code grow so that it should better be split up in smaller units that the human brain can still handle. it is the but necessary that i disagree with. there is plenty of code that could be cleaned up and made to look a lot nicer, but we shouldn't touch it, unless we are fixing something else or adding a new feature. it's pretty lame to explain to someone that the bug that was introduced by a code change was motivated by a desire to make the code cleaner. any code change runs the risk of breakage, thus changing code simply for cleanliness is not worth the risk. ben
Re: What's the QA strategy of ZooKeeper?
actually, the other way of doing the netty patch (since i'm scared of merges) would be to do a refactor cleanup patch with an eye toward netty, and then another patch to actually add netty. that would have been nice because the first patch would allow us to more easily make sure that NIO wasn't broken. and the second we could focus more on the netty addition. ben On 10/15/2010 03:07 PM, Patrick Hunt wrote: On Fri, Oct 15, 2010 at 12:11 PM, Henry Robinsonhe...@cloudera.com wrote: The netty patch is a good test case for this approach. If we feel that reworking the structure of the existing server cnxn code will make it significantly easier to add a second implementation that adheres to the same interface, then I say that such a refactoring is worthwhile, but even then only if it's straightforward to make the changes while convincing ourselves that the behaviour of the new implementation is consistent with the old. Thomas, do comment on the patch itself! That's the very best way to make sure your concerns get heard and addressed. Well really the _best_ way IMO is to both comment and submit a patch. ;-) And this is just what Thomas is doing, so kudos to him for the effort! Vishal is doing this as well for many of the issues he's found, so thanks to him also. We do appreciate you guys jumping in to help. Lack of contributors is one of the things we've been missing and addressing that opens the door to some of these improvements being suggested. Wrt the netty patch, the approach Ben and I took was to refactor sufficiently to add support for NIO/Netty/... while minimizing breakage. This is already a big patch, esp given that the code is not really as clean to begin with (complex too). Perfect situation, no. But the intent was to further clean things up once the original patch was reviewed/committed. Trying to do a huge refactoring in one shot (one patch) is not a good idea imo. Already these patches are too large. Perhaps lesson learned here is that we should have just created a special branch from the get go, applied a number of smaller patches to that branch, then eventually merged back into the trunk once it was fully baked. Patrick
Re: Restarting discussion on ZooKeeper as a TLP
+1 ben On 10/14/2010 11:47 AM, Henry Robinson wrote: +1, I agree that we've addressed most outstanding concerns, we're ready for TLP. Henry On 14 October 2010 13:29, Mahadev Konarmaha...@yahoo-inc.com wrote: +1 for moving to TLP. Thanks for starting the vote Pat. mahadev On 10/13/10 2:10 PM, Patrick Huntph...@apache.org wrote: In March of this year we discussed a request from the Apache Board, and Hadoop PMC, that we become a TLP rather than a subproject of Hadoop: Original discussion http://markmail.org/thread/42cobkpzlgotcbin I originally voted against this move, my primary concern being that we were not ready to move to tlp status given our small contributor base and limited contributor diversity. However I'd now like to revisit that discussion/decision. Since that time the team has been working hard to attract new contributors, and we've seen significant new contributions come in. There has also been feedback from board/pmc addressing many of these concerns (both on the list and in private). I am now less concerned about this issue and don't see it as a blocker for us to move to TLP status. A second concern was that by becoming a TLP the project would lose it's connection with Hadoop, a big source of new users for us. I've been assured (and you can see with the other projects that have moved to tlp status; pig/hive/hbase/etc...) that this connection will be maintained. The Hadoop ZooKeeper tab for example will redirect to our new homepage. Other Apache members also pointed out to me that we are essentially operating as a TLP within the Hadoop PMC. Most of the other PMC members have little or no experience with ZooKeeper and this makes it difficult for them to monitor and advise us. By moving to TLP status we'll be able to govern ourselves and better set our direction. I believe we are ready to become a TLP. Please respond to this email with your thoughts and any issues. I will call a vote in a few days, once discussion settles. Regards, Patrick
Re: What's the QA strategy of ZooKeeper?
code quality is important, and there are things we should keep in mind, but in general i really don't like the idea of risking code breakage because of a gratuitous code cleanup. we should be watching out for these things when patches get submitted or when new things go in. i think this is inline with what pat was saying. just to expand a bit. in my opinion clean up refactorings have the following problems: 1) you risk breaking things in production for a potential future maintenance advantage. 2) there is always subjectivity: quality code for one code quality zealot is often seen as a bad design by another code quality zealot. unless there is an objective reason to do it, don't. 3) you may cleanup the wrong way. you may restructure to make the current code clean and then end up rewriting and refactoring again to change the logic. i think we can mitigate 1) by only doing it when necessary. as a corollary we can mitigate 2) and 3) by only doing refactoring/cleanups when motivated by some new change: fix a bug, increased performance, new feature, etc. ben On 10/13/2010 06:18 AM, Thomas Koch wrote: Hi, after filling 13 refactoring issues against the Java Client code[1], I started to dig into the server site code to understand the last issues with the Netty stuff. I feel bad. It's this feeling of I don't wanna hurt you, but ZooKeeper is quite an important piece of the Hadoop ecosystem containing some of the most complicated pieces of code. And it'll only get more complex with more features. I'd propose to have a word about quality assurance. Is there already a strategy to ensure the ongoing maintainability of ZK? Is there a code style guide, a list of Dos-And-Donts (where I'd like to add some points)? Should PMD be added to Hudson? What is the level of FindBugs? Should it be raised? Some of the points I'd like to add to a style guide: - Don't write methods longer then 20-40 lines of code - Are you sure you want to use inner classes? - If there is a new operator in a method? Could the method maybe already receive the object as a parameter? - Are you sure you want to use system properties? They are like global variables and the IDE does not know about them - Are you sure you want to extend a class? Often an aggregation is more elegant. - Don't nest ifs and loops deeper then 2 or 3 levels. If you do so, you should better break your code into more methods. - Use Enums or constants instead of plain status integers - please document your intentions in code comments. You don't need to comment the what? but the why?. Do you agree with me, that there is a need for better code quality in ZooKeeper? If so, it's not really scalable if a manic like me fights like Don Quichotte to clean up the code. All developers would need to establish a sense for clean code and constantly improve the code. [1] https://issues.apache.org/jira/browse/ZOOKEEPER-835 Best regards, Thomas Koch, http://www.koch.ro
[jira] Updated: (ZOOKEEPER-881) ZooKeeperServer.loadData loads database twice
[ https://issues.apache.org/jira/browse/ZOOKEEPER-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-881: Hadoop Flags: [Reviewed] +1 nice catch! ZooKeeperServer.loadData loads database twice - Key: ZOOKEEPER-881 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-881 Project: Zookeeper Issue Type: Bug Components: server Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-881.patch zkDb.loadDataBase() is called twice at the beginning of loadData(). It shouldn't have any negative affects, but is unnecessary. A patch should be trivial. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-881) ZooKeeperServer.loadData loads database twice
[ https://issues.apache.org/jira/browse/ZOOKEEPER-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921233#action_12921233 ] Benjamin Reed commented on ZOOKEEPER-881: - Committed revision 1022824. ZooKeeperServer.loadData loads database twice - Key: ZOOKEEPER-881 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-881 Project: Zookeeper Issue Type: Bug Components: server Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-881.patch zkDb.loadDataBase() is called twice at the beginning of loadData(). It shouldn't have any negative affects, but is unnecessary. A patch should be trivial. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-864) Hedwig C++ client improvements
[ https://issues.apache.org/jira/browse/ZOOKEEPER-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-864: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) thanx ivan! Committed revision 1021463. Hedwig C++ client improvements -- Key: ZOOKEEPER-864 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-864 Project: Zookeeper Issue Type: Improvement Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 3.4.0 Attachments: warnings.txt, ZOOKEEPER-864.diff, ZOOKEEPER-864.diff, ZOOKEEPER-864.diff, ZOOKEEPER-864.diff I changed the socket code to use boost asio. Now the client only creates one thread, and all operations are non-blocking. Tests are now automated, just run make check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-886) Hedwig Server stays in disconnected state when connection to ZK dies but gets reconnected
[ https://issues.apache.org/jira/browse/ZOOKEEPER-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-886: Hadoop Flags: [Reviewed] +1 good catch erwin! Hedwig Server stays in disconnected state when connection to ZK dies but gets reconnected --- Key: ZOOKEEPER-886 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-886 Project: Zookeeper Issue Type: Bug Components: contrib-hedwig Reporter: Erwin Tam Assignee: Erwin Tam Attachments: ZOOKEEPER-886.patch The Hedwig Server is connected to ZooKeeper. In the ZkTopicManager, it registers a watcher so that if it ever gets disconnected from ZK, it will temporarily fail all incoming requests since the Hedwig server does not know for sure if it is still the master for the topics. When the ZK client gets reconnected, the logic currently is wrong and it does not unset the suspended flag. Thus once it gets disconnected, it will stay in the suspended state forever, thereby making the Hedwig server hub dead. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-886) Hedwig Server stays in disconnected state when connection to ZK dies but gets reconnected
[ https://issues.apache.org/jira/browse/ZOOKEEPER-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-886: Resolution: Fixed Status: Resolved (was: Patch Available) Committed revision 1021501. Hedwig Server stays in disconnected state when connection to ZK dies but gets reconnected --- Key: ZOOKEEPER-886 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-886 Project: Zookeeper Issue Type: Bug Components: contrib-hedwig Reporter: Erwin Tam Assignee: Erwin Tam Attachments: ZOOKEEPER-886.patch The Hedwig Server is connected to ZooKeeper. In the ZkTopicManager, it registers a watcher so that if it ever gets disconnected from ZK, it will temporarily fail all incoming requests since the Hedwig server does not know for sure if it is still the master for the topics. When the ZK client gets reconnected, the logic currently is wrong and it does not unset the suspended flag. Thus once it gets disconnected, it will stay in the suspended state forever, thereby making the Hedwig server hub dead. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-822: Hadoop Flags: [Reviewed] +1 looks good. ready to commit. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915796#action_12915796 ] Benjamin Reed commented on ZOOKEEPER-822: - looks good overall flavio. just a quick questions: i notice that operations on senderWorkerMap in initiateConnection are not synchronized. senderWorkerMap is concurrent, but there could be a race between the get, put, and vsw.finish if initiateConnection is called concurrently for the same sid. right? also you need to add a blurb to the config doc for the timeout system variable, which should be zookeeper.cnxtimeout so that it can be set from the configuration file. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-820) update c unit tests to ensure zombie java server processes don't cause failure
[ https://issues.apache.org/jira/browse/ZOOKEEPER-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915799#action_12915799 ] Benjamin Reed commented on ZOOKEEPER-820: - +1 this looks good to me. did you try it on cygwin? update c unit tests to ensure zombie java server processes don't cause failure Key: ZOOKEEPER-820 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-820 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Patrick Hunt Assignee: Michi Mutsuzaki Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-820-1.patch, ZOOKEEPER-820.patch When the c unit tests are run sometimes the server doesn't shutdown at the end of the test, this causes subsequent tests (hudson esp) to fail. 1) we should try harder to make the server shut down at the end of the test, I suspect this is related to test failing/cleanup 2) before the tests are run we should see if the old server is still running and try to shut it down -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915849#action_12915849 ] Benjamin Reed commented on ZOOKEEPER-880: - is there an easy way to reproduce this? QuorumCnxManager$SendWorker grows without bounds Key: ZOOKEEPER-880 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.2 Reporter: Jean-Daniel Cryans Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-869) Support for election of leader with arbitrary zxid
[ https://issues.apache.org/jira/browse/ZOOKEEPER-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910643#action_12910643 ] Benjamin Reed commented on ZOOKEEPER-869: - this is a good observation diogo, but i think you may be characterizing it improperly. the problem is that when we do a leadership we increment the epoch and propose a new leader, so all other processes will be much lower than the leader. when a follower connects we figure out how far behind the follower is by comparing the lastProposed zxids and taking the difference. we should really be using the recent history to do the comparison. as a side note, if we were to chose not to take the maximum zxid during recovery, we need to make sure that we still cover all committed messages. Support for election of leader with arbitrary zxid -- Key: ZOOKEEPER-869 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-869 Project: Zookeeper Issue Type: New Feature Reporter: Diogo Priority: Minor Currently, the leader election algorithm implemented guarantees that the leader has the maximum zxid of the ensemble. The state synchronization after the election was built based on this assumption. However, other leader elections algorithms might elect leaders with arbitrary zxid. To support other leader election algorithms, the state synchronization should allow the leader to have an arbitrary zxid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-831) BookKeeper: Throttling improved for reads
[ https://issues.apache.org/jira/browse/ZOOKEEPER-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-831: Status: Resolved (was: Patch Available) Resolution: Fixed Committed revision 998200. thanx for the fix flavio and ivan for the reviews! BookKeeper: Throttling improved for reads - Key: ZOOKEEPER-831 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-831 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Fix For: 3.4.0 Attachments: ZOOKEEPER-831.patch, ZOOKEEPER-831.patch, ZOOKEEPER-831.patch, ZOOKEEPER-831.patch Reads and writes in BookKeeper are asymmetric: a write request writes one entry, whereas a read request may read multiple requests. The current implementation of throttling only counts the number of read requests instead of counting the number of entries being read. Consequently, a few read requests reading a large number of entries each will spawn a large number of read-entry requests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call
[ https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-846: Hadoop Flags: [Reviewed] +1 looks good pat! it's nice that the checking and setting of closing is in the same routine. i agreed about skipping the test case. zookeeper client doesn't shut down cleanly on the close call Key: ZOOKEEPER-846 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.2.2 Reporter: Ted Yu Assignee: Patrick Hunt Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: rs-13.stack, ZOOKEEPER-846.patch Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where Regionserver process was shutting down and seemed to hang. Here is the bottom of region server log: http://pastebin.com/YYawJ4jA zookeeper-3.2.2 is used. Here is relevant portion from jstack - I attempted to attach jstack twice in my email to d...@hbase.apache.org but failed: DestroyJavaVM prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on condition [0x] java.lang.Thread.State: RUNNABLE regionserver/10.32.42.245:60020 prio=10 tid=0x2aabb84ce000 nid=0x6c81 in Object.wait() [0x43755000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x2aaab76633c0 (a org.apache.zookeeper.ClientCnxn$Packet) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099) - locked 0x2aaab76633c0 (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077) at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505) - locked 0x2aaabf5e0c30 (a org.apache.zookeeper.ZooKeeper) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654) at java.lang.Thread.run(Thread.java:619) main-EventThread daemon prio=10 tid=0x43474000 nid=0x6c80 waiting on condition [0x413f3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaabf6e9150 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901949#action_12901949 ] Benjamin Reed commented on ZOOKEEPER-366: - holger you are correct. nanoTime is the way to go. i'll prepare a fix. one problem with it is that the fix will be impossible to test. Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed Attachments: ZOOKEEPER-366.patch the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-366: Attachment: ZOOKEEPER-366.patch this patch smooths out the effect of a radical time change by always sleeping at least 1/2 tickTime. this means that if we really needed to do a big jump forward, it will take up 1/2 of the jump to converge on the real time. because clients ping for idle times of 1/3 the timeout, there should be few sessions that expire. we could reduce that number, but take even longer to converge if we always sleep at least 3/4 of the tickTime. Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed Attachments: ZOOKEEPER-366.patch the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900824#action_12900824 ] Benjamin Reed commented on ZOOKEEPER-366: - anyone have an idea of how to test this? i need to mock System.currentTimeMillis(). Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed Attachments: ZOOKEEPER-366.patch the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed reassigned ZOOKEEPER-366: --- Assignee: Benjamin Reed Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900511#action_12900511 ] Benjamin Reed commented on ZOOKEEPER-366: - after discussion this on the list, we realized that we can detect a big jump in time change in the session expiration thread. since we expire a bucket of sessions each tick, if we run into the situation where we are going to expire more than one bucket in a row, we know we have jumped forward in time. we can smooth the jump by requiring at least a 1/2 ticktime wait between each bucket. Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-795) eventThread isn't shutdown after a connection session expired event coming
[ https://issues.apache.org/jira/browse/ZOOKEEPER-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-795: Status: Resolved (was: Patch Available) Resolution: Fixed Committed revision 986470. in branch 3.3 eventThread isn't shutdown after a connection session expired event coming Key: ZOOKEEPER-795 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-795 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: ubuntu 10.04 Reporter: mathieu barcikowski Assignee: Sergey Doroshenko Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ExpiredSessionThreadLeak.java, ZOOKEEPER-795.patch, ZOOKEEPER-795.patch Hi, I notice a problem with the eventThread located in ClientCnxn.java file. The eventThread isn't shutdown after a connection session expired event coming (i.e. never receive EventOfDeath). When a session timeout occurs and the session is marked as expired, the connexion is fully closed (socket, SendThread...) expect for the eventThread. As a result, if i create a new zookeeper object and connect through it, I got a zombi thread which will never be kill (as for the previous zookeeper object, the state is already close, calling close again don't do anything). So everytime I will create a new zookeeper connection after a expired session, I will have a one more zombi EventThread. How to reproduce : - Start a zookeeper client connection in debug mode - Pause the jvm enough time to the expired event occur - Watch for example with jvisualvm the list of threads, the sendThread is succesfully killed, but the EventThread go to wait state for a infinity of time - if you reopen a new zookeeper connection, and do again the previous steps, another EventThread will be present in infinite wait state -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-733) use netty to handle client connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899101#action_12899101 ] Benjamin Reed commented on ZOOKEEPER-733: - we should commit the patch as is. trying to add features to it and maintain the patch fresh is too unwieldy! use netty to handle client connections -- Key: ZOOKEEPER-733 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-733 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Benjamin Reed Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: accessive.jar, flowctl.zip, moved.zip, QuorumTestFailed_sessionmoved_TRACE_LOG.txt.gz, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch we currently have our own asynchronous NIO socket engine to be able to handle lots of clients with a single thread. over time the engine has become more complicated. we would also like the engine to use multiple threads on machines with lots of cores. plus, we would like to be able to support things like SSL. if we switch to netty, we can simplify our code and get the previously mentioned benefits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-845) remove duplicate code from netty and nio ServerCnxn classes
remove duplicate code from netty and nio ServerCnxn classes --- Key: ZOOKEEPER-845 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-845 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Benjamin Reed the code for handling the 4-letter words is duplicated between the nio and netty versions of ServerCnxn. this makes maintenance problematic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-845) remove duplicate code from netty and nio ServerCnxn classes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897880#action_12897880 ] Benjamin Reed commented on ZOOKEEPER-845: - perhaps we could extract the actual processing logic from the threading model. remove duplicate code from netty and nio ServerCnxn classes --- Key: ZOOKEEPER-845 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-845 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Benjamin Reed Fix For: 3.4.0 the code for handling the 4-letter words is duplicated between the nio and netty versions of ServerCnxn. this makes maintenance problematic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897581#action_12897581 ] Benjamin Reed commented on ZOOKEEPER-775: - i believe the NOTICE file is consistent with: http://apache.org/legal/src-headers.html#header-existingcopyright A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, libs_2.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch, ZOOKEEPER-775_3.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-338) zk hosts should be resolved periodically for loadbalancing amongst zk servers.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-338: Component/s: java client (was: c client) it is an issue for both the c and java clients. zk hosts should be resolved periodically for loadbalancing amongst zk servers. -- Key: ZOOKEEPER-338 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-338 Project: Zookeeper Issue Type: New Feature Components: c client, java client Affects Versions: 3.0.0, 3.0.1, 3.1.0 Reporter: Mahadev konar The list of host names passed to ZK init method is resolved only once. Had a corresponding DNS entry been changed, it would not be refreshed by the ZK library,effectively preventing from proper load balancing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-338) zk hosts should be resolved periodically for loadbalancing amongst zk servers.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-338: Component/s: c client zk hosts should be resolved periodically for loadbalancing amongst zk servers. -- Key: ZOOKEEPER-338 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-338 Project: Zookeeper Issue Type: New Feature Components: c client, java client Affects Versions: 3.0.0, 3.0.1, 3.1.0 Reporter: Mahadev konar The list of host names passed to ZK init method is resolved only once. Had a corresponding DNS entry been changed, it would not be refreshed by the ZK library,effectively preventing from proper load balancing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896948#action_12896948 ] Benjamin Reed commented on ZOOKEEPER-794: - alexis, i'm missing the problem you are pointing out. is it an issue with the ordering of the callbacks? i'm also wondering about your _3 patch. it is much smaller than the others. is it to be applied to trunk, or is it relative to a different patch? Callbacks are not invoked when the client is closed --- Key: ZOOKEEPER-794 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Reporter: Alexis Midon Assignee: Alexis Midon Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, ZOOKEEPER-794_2.patch, ZOOKEEPER-794_3.patch I noticed that ZooKeeper has different behaviors when calling synchronous or asynchronous actions on a closed ZooKeeper client. Actually a synchronous call will throw a session expired exception while an asynchronous call will do nothing. No exception, no callback invocation. Actually, even if the EventThread receives the Packet with the session expired err code, the packet is never processed since the thread has been killed by the ventOfDeath. So the call back is not invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-829) Add /zookeeper/sessions/* to allow inspection/manipulation of client sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893910#action_12893910 ] Benjamin Reed commented on ZOOKEEPER-829: - should we kill the session immediately or wait until the sessionTimeout. killing it immediate seems like it is violating a contract. Add /zookeeper/sessions/* to allow inspection/manipulation of client sessions - Key: ZOOKEEPER-829 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-829 Project: Zookeeper Issue Type: New Feature Components: server Reporter: Todd Lipcon For some use cases in HBase (HBASE-1316 in particular) we'd like the ability to forcible expire someone else's ZK session. Patrick and I discussed on IRC and came up with an idea of creating nodes in /zookeeper/sessions/session id that can be read in order to get basic stats about a session, and written in order to manipulate one. The manipulation we need in HBase is the ability to write a command like kill, but others might be useful as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-795) eventThread isn't shutdown after a connection session expired event coming
[ https://issues.apache.org/jira/browse/ZOOKEEPER-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-795: Status: Patch Available (was: Open) eventThread isn't shutdown after a connection session expired event coming Key: ZOOKEEPER-795 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-795 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: ubuntu 10.04 Reporter: mathieu barcikowski Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ExpiredSessionThreadLeak.java, ZOOKEEPER-795.patch, ZOOKEEPER-795.patch Hi, I notice a problem with the eventThread located in ClientCnxn.java file. The eventThread isn't shutdown after a connection session expired event coming (i.e. never receive EventOfDeath). When a session timeout occurs and the session is marked as expired, the connexion is fully closed (socket, SendThread...) expect for the eventThread. As a result, if i create a new zookeeper object and connect through it, I got a zombi thread which will never be kill (as for the previous zookeeper object, the state is already close, calling close again don't do anything). So everytime I will create a new zookeeper connection after a expired session, I will have a one more zombi EventThread. How to reproduce : - Start a zookeeper client connection in debug mode - Pause the jvm enough time to the expired event occur - Watch for example with jvisualvm the list of threads, the sendThread is succesfully killed, but the EventThread go to wait state for a infinity of time - if you reopen a new zookeeper connection, and do again the previous steps, another EventThread will be present in infinite wait state -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-795) eventThread isn't shutdown after a connection session expired event coming
[ https://issues.apache.org/jira/browse/ZOOKEEPER-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-795: Attachment: ZOOKEEPER-795.patch i've added a test. (added to the existing session expiration test, so it shouldn't add any running time to the tests) eventThread isn't shutdown after a connection session expired event coming Key: ZOOKEEPER-795 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-795 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: ubuntu 10.04 Reporter: mathieu barcikowski Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ExpiredSessionThreadLeak.java, ZOOKEEPER-795.patch, ZOOKEEPER-795.patch Hi, I notice a problem with the eventThread located in ClientCnxn.java file. The eventThread isn't shutdown after a connection session expired event coming (i.e. never receive EventOfDeath). When a session timeout occurs and the session is marked as expired, the connexion is fully closed (socket, SendThread...) expect for the eventThread. As a result, if i create a new zookeeper object and connect through it, I got a zombi thread which will never be kill (as for the previous zookeeper object, the state is already close, calling close again don't do anything). So everytime I will create a new zookeeper connection after a expired session, I will have a one more zombi EventThread. How to reproduce : - Start a zookeeper client connection in debug mode - Pause the jvm enough time to the expired event occur - Watch for example with jvisualvm the list of threads, the sendThread is succesfully killed, but the EventThread go to wait state for a infinity of time - if you reopen a new zookeeper connection, and do again the previous steps, another EventThread will be present in infinite wait state -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-790: +1 excellent work you guys. i also like QuorumUtil sergey! thanx for implementing it. Last processed zxid set prematurely while establishing leadership - Key: ZOOKEEPER-790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-follower-request-NPE.log, ZOOKEEPER-790-test.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2, ZOOKEEPER-790.v2.patch, ZOOKEEPER-790.v2.patch The leader code is setting the last processed zxid to the first of the new epoch even before connecting to a quorum of followers. Because the leader code sets this value before connecting to a quorum of followers (Leader.java:281) and the follower code throws an IOException (Follower.java:73) if the leader epoch is smaller, we have that when the false leader drops leadership and becomes a follower, it finds a smaller epoch and kills itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892860#action_12892860 ] Benjamin Reed commented on ZOOKEEPER-775: - can we do the forrest doc as a separate patch? it's already quite large as it is. A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, libs_2.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch, ZOOKEEPER-775_3.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-733) use netty to handle client connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-733: Status: Open (was: Patch Available) use netty to handle client connections -- Key: ZOOKEEPER-733 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-733 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Benjamin Reed Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: accessive.jar, flowctl.zip, moved.zip, QuorumTestFailed_sessionmoved_TRACE_LOG.txt.gz, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch we currently have our own asynchronous NIO socket engine to be able to handle lots of clients with a single thread. over time the engine has become more complicated. we would also like the engine to use multiple threads on machines with lots of cores. plus, we would like to be able to support things like SSL. if we switch to netty, we can simplify our code and get the previously mentioned benefits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-733) use netty to handle client connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-733: Status: Patch Available (was: Open) use netty to handle client connections -- Key: ZOOKEEPER-733 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-733 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Benjamin Reed Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: accessive.jar, flowctl.zip, moved.zip, QuorumTestFailed_sessionmoved_TRACE_LOG.txt.gz, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch we currently have our own asynchronous NIO socket engine to be able to handle lots of clients with a single thread. over time the engine has become more complicated. we would also like the engine to use multiple threads on machines with lots of cores. plus, we would like to be able to support things like SSL. if we switch to netty, we can simplify our code and get the previously mentioned benefits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-733) use netty to handle client connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893025#action_12893025 ] Benjamin Reed commented on ZOOKEEPER-733: - i ran this on 40 machines simulating 900 clients. the benchmark went well without problems. the results don't show any real significant performance improvements (or degradations). use netty to handle client connections -- Key: ZOOKEEPER-733 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-733 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Benjamin Reed Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: accessive.jar, flowctl.zip, moved.zip, QuorumTestFailed_sessionmoved_TRACE_LOG.txt.gz, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch we currently have our own asynchronous NIO socket engine to be able to handle lots of clients with a single thread. over time the engine has become more complicated. we would also like the engine to use multiple threads on machines with lots of cores. plus, we would like to be able to support things like SSL. if we switch to netty, we can simplify our code and get the previously mentioned benefits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-790: Status: Resolved (was: Patch Available) Resolution: Fixed Committed revision 966960. Committed revision 966984. Last processed zxid set prematurely while establishing leadership - Key: ZOOKEEPER-790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.1 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2 The leader code is setting the last processed zxid to the first of the new epoch even before connecting to a quorum of followers. Because the leader code sets this value before connecting to a quorum of followers (Leader.java:281) and the follower code throws an IOException (Follower.java:73) if the leader epoch is smaller, we have that when the false leader drops leadership and becomes a follower, it finds a smaller epoch and kills itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891210#action_12891210 ] Benjamin Reed commented on ZOOKEEPER-790: - looks great flavio! the only nit i have is that the test case assumes that s1 is not the leader. you might want to check that. Last processed zxid set prematurely while establishing leadership - Key: ZOOKEEPER-790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.1 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2 The leader code is setting the last processed zxid to the first of the new epoch even before connecting to a quorum of followers. Because the leader code sets this value before connecting to a quorum of followers (Leader.java:281) and the follower code throws an IOException (Follower.java:73) if the leader epoch is smaller, we have that when the false leader drops leadership and becomes a follower, it finds a smaller epoch and kills itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-790: Hadoop Flags: [Reviewed] +1 great job flavio! thanx for your help travis and vishal. Last processed zxid set prematurely while establishing leadership - Key: ZOOKEEPER-790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.1 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2 The leader code is setting the last processed zxid to the first of the new epoch even before connecting to a quorum of followers. Because the leader code sets this value before connecting to a quorum of followers (Leader.java:281) and the follower code throws an IOException (Follower.java:73) if the leader epoch is smaller, we have that when the false leader drops leadership and becomes a follower, it finds a smaller epoch and kills itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-806) Cluster management with Zookeeper - Norbert
[ https://issues.apache.org/jira/browse/ZOOKEEPER-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886864#action_12886864 ] Benjamin Reed commented on ZOOKEEPER-806: - this looks really cool. is there a collaboration model you were thinking of? (btw, have you guys thought of presenting this at the hadoop summit or similar venue?) Cluster management with Zookeeper - Norbert --- Key: ZOOKEEPER-806 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-806 Project: Zookeeper Issue Type: New Feature Reporter: John Wang Hello, we have built a cluster management layer on top of Zookeeper here at the SNA team at LinkedIn: http://sna-projects.com/norbert/ We were wondering ways for collaboration as this is a very useful application of zookeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-807) bookkeeper does not put enough meta-data in to do recovery properly
bookkeeper does not put enough meta-data in to do recovery properly --- Key: ZOOKEEPER-807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-807 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Reporter: Benjamin Reed somewhere, probably zookeeper, we need to keep track of the the information about keys used for access and for mac validation as well as the digest type for entries. we can't write a general recovery tool without it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-712) Bookie recovery
[ https://issues.apache.org/jira/browse/ZOOKEEPER-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-712: Hadoop Flags: [Reviewed] +1 looks good. thanx erwin! Bookie recovery --- Key: ZOOKEEPER-712 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-712 Project: Zookeeper Issue Type: New Feature Components: contrib-bookkeeper Reporter: Flavio Paiva Junqueira Assignee: Erwin Tam Fix For: 3.4.0 Attachments: ZOOKEEPER-712.patch Recover the ledger fragments of a bookie once it crashes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-719) Add throttling to BookKeeper client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-719: Status: Resolved (was: Patch Available) Resolution: Fixed Committed revision 962693. Add throttling to BookKeeper client --- Key: ZOOKEEPER-719 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-719 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Affects Versions: 3.3.0 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Fix For: 3.4.0 Attachments: ZOOKEEPER-719.patch, ZOOKEEPER-719.patch, ZOOKEEPER-719.patch, ZOOKEEPER-719.patch Add throttling to client to control the rate of operations to bookies. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-712) Bookie recovery
[ https://issues.apache.org/jira/browse/ZOOKEEPER-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-712: Status: Resolved (was: Patch Available) Resolution: Fixed Committed revision 962697. Bookie recovery --- Key: ZOOKEEPER-712 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-712 Project: Zookeeper Issue Type: New Feature Components: contrib-bookkeeper Reporter: Flavio Paiva Junqueira Assignee: Erwin Tam Fix For: 3.4.0 Attachments: ZOOKEEPER-712.patch Recover the ledger fragments of a bookie once it crashes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-794: Status: Open (was: Patch Available) -1 we need to get a test in. also the fix has a race condition. the boolean flag may changed after it is checked and before the request is queued. Callbacks are not invoked when the client is closed --- Key: ZOOKEEPER-794 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Reporter: Alexis Midon Assignee: Alexis Midon Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt I noticed that ZooKeeper has different behaviors when calling synchronous or asynchronous actions on a closed ZooKeeper client. Actually a synchronous call will throw a session expired exception while an asynchronous call will do nothing. No exception, no callback invocation. Actually, even if the EventThread receives the Packet with the session expired err code, the packet is never processed since the thread has been killed by the ventOfDeath. So the call back is not invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-794: Status: Patch Available (was: Open) Callbacks are not invoked when the client is closed --- Key: ZOOKEEPER-794 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Reporter: Alexis Midon Assignee: Alexis Midon Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, ZOOKEEPER-794_2.patch I noticed that ZooKeeper has different behaviors when calling synchronous or asynchronous actions on a closed ZooKeeper client. Actually a synchronous call will throw a session expired exception while an asynchronous call will do nothing. No exception, no callback invocation. Actually, even if the EventThread receives the Packet with the session expired err code, the packet is never processed since the thread has been killed by the ventOfDeath. So the call back is not invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-794: Attachment: ZOOKEEPER-794_2.patch i've added a test case and i think i've addressed the race condition. alexis can you check it out. the only change to your code was to make waskilled volatile and move where it was set. Callbacks are not invoked when the client is closed --- Key: ZOOKEEPER-794 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Reporter: Alexis Midon Assignee: Alexis Midon Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, ZOOKEEPER-794_2.patch I noticed that ZooKeeper has different behaviors when calling synchronous or asynchronous actions on a closed ZooKeeper client. Actually a synchronous call will throw a session expired exception while an asynchronous call will do nothing. No exception, no callback invocation. Actually, even if the EventThread receives the Packet with the session expired err code, the packet is never processed since the thread has been killed by the ventOfDeath. So the call back is not invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-719) Add throttling to BookKeeper client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886019#action_12886019 ] Benjamin Reed commented on ZOOKEEPER-719: - +1 looks good Add throttling to BookKeeper client --- Key: ZOOKEEPER-719 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-719 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Affects Versions: 3.3.0 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Fix For: 3.4.0 Attachments: ZOOKEEPER-719.patch, ZOOKEEPER-719.patch, ZOOKEEPER-719.patch, ZOOKEEPER-719.patch Add throttling to client to control the rate of operations to bookies. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-719) Add throttling to BookKeeper client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878850#action_12878850 ] Benjamin Reed commented on ZOOKEEPER-719: - i think using a system property is still the easiest, but i'm fine with the set/get if you want to do it. you just need to make it thread safe. Add throttling to BookKeeper client --- Key: ZOOKEEPER-719 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-719 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Affects Versions: 3.3.0 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Fix For: 3.4.0 Attachments: ZOOKEEPER-719.patch, ZOOKEEPER-719.patch Add throttling to client to control the rate of operations to bookies. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code
[ https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877131#action_12877131 ] Benjamin Reed commented on ZOOKEEPER-767: - 1) just to make sure we are talking about the same thing. this is the code i'm referring to: {noformat} // Check that we don't already have a lock... if (currentExclusiveLock != null !isExpired(currentExclusiveLock)) { // We have the exclusive lock! Remove newly made lock file and just // return. zooKeeper.delete(writeLock, -1); return currentExclusiveLock; } {noformat} 2) no, i'm talking about when you go to get the shared lock, you first check to see if you have a shared lock. shouldn't you check for both shared and exclusive? 3) the problem is that connection loss and session expiration are different. with connection loss you will get an exception, but your session can recover and you can keep using it. for session expired you are right the EPHEMERAL will go away. in the connection loss scenario you have a situation where you may acquire a lock but not know it. with regard to the question of current lock implementation in the repository. i'm trying to understand the differences with that implementation and yours. both follow the same recipe right? if the current lock implementation implemented shared locks, would you have used that one? or is there something more fundamental? Submitting Demo/Recipe Shared / Exclusive Lock Code --- Key: ZOOKEEPER-767 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767 Project: Zookeeper Issue Type: Improvement Components: recipes Affects Versions: 3.3.0 Reporter: Sam Baskinger Assignee: Sam Baskinger Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-767.patch, ZOOKEEPER-767.patch Networked Insights would like to share-back some code for shared/exclusive locking that we are using in our labs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Fwd: RE: WORKSHOP ORGANIZER ZooKeeper
i will be there. i would be glad to do a talk on bookkeeper/hedwig. ben On 06/09/2010 11:10 AM, Patrick Hunt wrote: Fellow contributors, Yahoo is hosting a contributor workshop the day after the Hadoop Summit. The purpose of the workshops is to collectively discuss challenges, concerns and future ideas around ZooKeeper technologies. This will be held at Yahoo's Sunnyvale campus, all ZooKeeper contributors are welcome to attend. I was thinking of doing a half day with something like the following agenda: * intros * presentations ** phunt - status of zk, roadmap **addl presentations * discussion, qa with committers/contributors, etc... If you would like to present please respond to this email with details. It's probably a good idea to keep the presentations down to 20-30 minutes max. I'd like to get back to Yahoo with an agenda soon so let me know asap. Patrick Original Message Subject: RE: WORKSHOP ORGANIZER ZooKeeper Date: Wed, 9 Jun 2010 10:52:27 -0700 From: Dekel Tankelde...@yahoo-inc.com To: Patrick Huntph...@apache.org Hi Patrick. We are setting up 3 meeting on the contributes meetup page for June 30th. It will be in the classrooms in building E on the SNV campus. I have the agenda for the core and Pig meetings, can you send me the zookeeper one as well (lunch will be available next to the classroom around 12). You can make it a half day or a full day, up to you. I'll setup the meeting with the classroom location once I hear back. I will also provide a conference number. -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Friday, June 04, 2010 11:53 AM To: hadoopcontribu...@yahoo-inc.com Cc: Owen O'Malley Subject: Re: WORKSHOP ORGANIZER ZooKeeper Are there any more details on this that you could share. I'd like to ramp up discussion on this in our contributor community, but I think that I should provide some detail as part of this. For example if I could confirm the location and time that we have available for our meeting. Will we have a conference bridge available to us? Conference phone? Thanks, Patrick On 03/29/2010 04:27 PM, Patrick Hunt wrote: The ZooKeeper team would like to have it's own workshop on the 30th. In our case we probably only need 2 hrs or so on that day. I will be the coordinator for our event, please let me know what I need to do. Thanks, Patrick
[jira] Updated: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-775: Status: Patch Available (was: Open) A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, libs_2.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch, ZOOKEEPER-775_3.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code
[ https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-767: Status: Open (was: Patch Available) Submitting Demo/Recipe Shared / Exclusive Lock Code --- Key: ZOOKEEPER-767 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767 Project: Zookeeper Issue Type: Improvement Components: recipes Affects Versions: 3.3.0 Reporter: Sam Baskinger Assignee: Sam Baskinger Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-767.patch, ZOOKEEPER-767.patch Networked Insights would like to share-back some code for shared/exclusive locking that we are using in our labs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-785) Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line
[ https://issues.apache.org/jira/browse/ZOOKEEPER-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875227#action_12875227 ] Benjamin Reed commented on ZOOKEEPER-785: - +1 i think we should log the message as a warning rather than error since we completely recover from the situation. we may also want to log a warning for 2 servers to indicate that failures will not be tolerated. (feel free to ignore both comments and commit the patch :) Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line --- Key: ZOOKEEPER-785 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-785 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Environment: Tested in linux with a new jvm Reporter: Alex Newman Assignee: Patrick Hunt Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-785.patch The following config causes an infinite loop [zoo.cfg] tickTime=2000 dataDir=/var/zookeeper/ clientPort=2181 initLimit=10 syncLimit=5 server.0=localhost:2888:3888 Output: 2010-06-01 16:20:32,471 - INFO [main:quorumpeerm...@119] - Starting quorum peer 2010-06-01 16:20:32,489 - INFO [main:nioservercnxn$fact...@143] - binding to port 0.0.0.0/0.0.0.0:2181 2010-06-01 16:20:32,504 - INFO [main:quorump...@818] - tickTime set to 2000 2010-06-01 16:20:32,504 - INFO [main:quorump...@829] - minSessionTimeout set to -1 2010-06-01 16:20:32,505 - INFO [main:quorump...@840] - maxSessionTimeout set to -1 2010-06-01 16:20:32,505 - INFO [main:quorump...@855] - initLimit set to 10 2010-06-01 16:20:32,526 - INFO [main:files...@82] - Reading snapshot /var/zookeeper/version-2/snapshot.c 2010-06-01 16:20:32,547 - INFO [Thread-1:quorumcnxmanager$liste...@436] - My election bind port: 3888 2010-06-01 16:20:32,554 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING 2010-06-01 16:20:32,556 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id = 0, Proposed zxid = 12 2010-06-01 16:20:32,558 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 12, 1, 0, LOOKING, LOOKING, 0 2010-06-01 16:20:32,560 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception java.lang.NullPointerException at org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621) 2010-06-01 16:20:32,560 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING 2010-06-01 16:20:32,560 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id = 0, Proposed zxid = 12 2010-06-01 16:20:32,561 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 12, 2, 0, LOOKING, LOOKING, 0 2010-06-01 16:20:32,561 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception java.lang.NullPointerException at org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621) 2010-06-01 16:20:32,561 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING 2010-06-01 16:20:32,562 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My id = 0, Proposed zxid = 12 2010-06-01 16:20:32,562 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 12, 3, 0, LOOKING, LOOKING, 0 2010-06-01 16:20:32,562 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception java.lang.NullPointerException Things like HBase require that the zookeeper servers be listed in the zoo.cfg. This is a bug on their part, but zookeeper shouldn't null pointer in a loop though. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: enhance zookeeper lock function in cocurrent condition
if i understand your situation correctly, you have a lock that may have more than 100,000 processes contending for a lock. since this can cause a problem for getChildren, you want to have a way to get the server to do it for you without returning everything. the isFirst method would return true if you are first (sorted in utf8 order?) in the list of children. and you can set a watch on that condition. what do the path and type arguments do? ben On 06/03/2010 03:20 AM, Joe Zou wrote: Hi All: Use zookeeper to build distribute lock is main feature. now implement the lock function as below code: Public void lock() throws InterruptedException{ Do{ If(path == null){ Path = zk.create(lockPrefix,null,acl, CreateMode./EPHEMERAL_SEQUENTIAL/) } ListString children = zk.getChildren(parentPath); If(isFirst(children,path)){ Return; }else{ Final CowntDownLatch latch = new CountDownLatch(1); String nestestChild = findLastBefore(children,path); If(zk.exist(nestestChildPath,new Watcher(Event){ Latch.countDown(); }) != null){ Latch.await(); }else{ //acquire lock success Return; } } }while(true); } In high concurrent case, lock node may need to get a big ephemeral children nodes. So that the GetChildren may cause the package exceeding the limitation(4MB as default), and also this would cause the performance issue. To avoid the issue, I plan to add a new interface isFirst for zeekeeper. I don’t know if it is useful as a common usage, but I do think it should help a little bit in the concurrent situation. Below is snippet of the code change, and the attachment is full list of it. Public void lock() throws InterruptedException{ Do{ If(path == null){ Path = zk.create(lockPrefix,null,acl, CreateMode./EPHEMERAL_SEQUENTIAL/) } Final CowntDownLatch latch = new CountDownLatch(1); If(!Zk.isFirst(parentPath,path,Type,new Watcher(Event){ Latch.countDown(); })){ Latch.countDown() }else{ //acquire success. Return; } }while(true); } As we know, only the first node can aquire the lock success, so when lock Type parent node remove child node, it need trigger the the wather to notify the first node. the second lock requirement is: in our current project, each save need require multiple lock. In distribute Env, it very maybe cause dead lock or lock starve. So we need a stateLock, in the lock node, it keep the multiple states to judge the node if acquire the lock or not. Example: Client1:lock( id1,id2,id3) -zdnode---01 Client2:lock(id2,id3)-zdnode---02 Client3:lock(id4) -zdnode---03 We need client2 need wait the lock until the client1 unlock lock. But client 3 can acquire the lock at once. These judge logic in zookeeper server. We add a LockState interface: *public* *interface* LockState{ String /PATH_SEPERATOR/ = /; String /PATH_DELIMIT/ = |; *boolean* isConflict(LockState state); *byte*[] getBytes(); } Any new lock strategy can be added by implement the interface. Attached is my code diff from 3.2.2 and the use lock some case. Best Regards Joe Zou
[jira] Updated: (ZOOKEEPER-733) use netty to handle client connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-733: Attachment: flowctl.zip here is my cut at flowctl with netty. flow control seems to be happening, but it doesn't seem to fix the problem. use netty to handle client connections -- Key: ZOOKEEPER-733 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-733 Project: Zookeeper Issue Type: Improvement Reporter: Benjamin Reed Attachments: accessive.jar, flowctl.zip, moved.zip, QuorumTestFailed_sessionmoved_TRACE_LOG.txt.gz, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch we currently have our own asynchronous NIO socket engine to be able to handle lots of clients with a single thread. over time the engine has become more complicated. we would also like the engine to use multiple threads on machines with lots of cores. plus, we would like to be able to support things like SSL. if we switch to netty, we can simplify our code and get the previously mentioned benefits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874137#action_12874137 ] Benjamin Reed commented on ZOOKEEPER-775: - i would like to fix the build once we have it in the subversion repository. should i just remove the README? i'm not sure it is worth expanding since it would duplicate text in the docs directory i'll fix the scripts and the dos2unix with respect to the headers, i notice that configs, docs, and Makefiles don't have the license header in the zk repository, which leaves: ./pom.xml ./client/pom.xml ./protocol/pom.xml ./protocol/src/main/protobuf/PubSubProtocol.proto ./scripts/analyze.py ./scripts/hw.bash ./scripts/quote ./server/pom.xml is it okay if i just do those? A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-775: Attachment: ZOOKEEPER-775_3.patch libs_2.zip updated to address phunts comments. A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, libs_2.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch, ZOOKEEPER-775_3.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-719) Add throttling to BookKeeper client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873167#action_12873167 ] Benjamin Reed commented on ZOOKEEPER-719: - there are a couple of problems: 1) you seem to have a stray opCounterSem in PerClientBookieClient. you define it, but you never use it. 2) i think it might be better to use a system property to set the throttling rather than allow it to be dynamically changed. it simplifies the code. setThrottle is especially problematic since you are catching InterruptedException and it isn't thread safe. Add throttling to BookKeeper client --- Key: ZOOKEEPER-719 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-719 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Affects Versions: 3.3.0 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Fix For: 3.4.0 Attachments: ZOOKEEPER-719.patch, ZOOKEEPER-719.patch Add throttling to client to control the rate of operations to bookies. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.