ZooKeeper approved by Apache Board as TLP!

2010-11-22 Thread Patrick Hunt
We are now officially an Apache TLP! http://bit.ly/9czN2x As part of the process for moving out from under Hadoop and into full TLP status we need to work through the following: http://incubator.apache.org/guides/graduation.html#new-project-hand-over If you are involved with the project, esp on th

Re: number of clients/watchers

2010-11-18 Thread Patrick Hunt
ulimit for the >> process, >> so be sure to have enough ensemble nodes to spread those connections across >> that this won't happen. I think maybe there's a JIRA out to deal with this >> issue, not sure what the status is. >> >> C >> >> -

Re: number of clients/watchers

2010-11-18 Thread Patrick Hunt
wn. > >> here's a similar test setup I used: > > Thanks Patrick - it's really nice to have those numbers and test harness > basis. > > We're still in architecture mode so some of the details are still in flux, > but I think this gives us an idea. > &

Re: number of clients/watchers

2010-11-18 Thread Patrick Hunt
Camille, that's a very good question. Largest cluster I've heard about is 10k sessions. Jeremy - largest I've ever tested was a 3 server cluster with ~500 sessions. Each session created 10k znodes (100bytes each znode) and set 5 watches on each. So 5 million znodes and 25million watches. I then ha

Re: Watcher examples

2010-11-15 Thread Patrick Hunt
It would be great to have more examples as part of the release artifact. Would you mind creating a JIRA/patch for this? http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute I'm thinking that we could have a src/contrib/examples or src/examples ... what do you guys think? (mahadev?) Patrick

[ANNOUNCE] Apache ZooKeeper 3.3.2

2010-11-12 Thread Patrick Hunt
The Apache ZooKeeper team is proud to announce Apache ZooKeeper version 3.3.2 ZooKeeper is a high-performance coordination service for distributed applications. It exposes common services - such as naming, configuration management, synchronization, and group services - in a simple interface so you

Re: Key factors for production readiness of Hedwig

2010-11-10 Thread Patrick Hunt
On Wed, Nov 10, 2010 at 10:58 AM, Erwin Tam wrote: > 1. Ops tools including monitoring and administration. Command port (4 letter words) for monitoring has worked extremely well for zk. Whatever you do put the command port on a separate port, and make it a full fledged feature rather than a hack

Re: Verifying Changes

2010-11-10 Thread Patrick Hunt
Perhaps something similar to what Ben detailed here? (rendezvous) http://developer.yahoo.com/blogs/hadoop/posts/2009/05/using_zookeeper_to_tame_system/ Change the key, add child znode(s) that's deleted by the "notified" client(s) once it's read the changed value. Some details need to be worked out

[Discussion] Some proposed logging (log4j) JIRAs

2010-11-09 Thread Patrick Hunt
I wanted to highlight a couple recent JIRAs that may have impact on users (api consumers AND admins of the service) in the 3.4 timeframe. If you want to weigh in please comment on the respective jira: 1) proposal to move to slf4j (remove/replace log4j) https://issues.apache.org/jira/browse/ZOOKEEP

Re: Running cluster behind load balancer

2010-11-04 Thread Patrick Hunt
e that list before it it >> used, so in reality, using a single DNS hostname resolving to all the server >> addresses will probably work just as well as most DNS-based load balancers. >> >> ben >> >> On 11/04/2010 08:26 AM, Patrick Hunt wrote: >>> Hi Chang

Re: JUnit tests do not produce logs if the JVM crashes

2010-11-04 Thread Patrick Hunt
In addition to what Mahadev suggested you can also change the log4j.properties to log to a file rather than the CONSOLE. Although that just redirects the logs, if there is some output to stdout/stderr then junit buffering is still in play. Patrick On Thu, Nov 4, 2010 at 8:15 AM, Mahadev Konar wr

Re: Running cluster behind load balancer

2010-11-04 Thread Patrick Hunt
Hi Chang, thanks for the insights, if you have a few minutes would you mind updating the FAQ with some of this detail? http://wiki.apache.org/hadoop/ZooKeeper/FAQ Thanks! Patrick On Thu, Nov 4, 2010 at 6:27 AM, Chang Song wrote: > > Sorry. I made a mistake on retry timeout in load balancer sect

Re: question about watcher

2010-11-03 Thread Patrick Hunt
2, 3, etc...). Patrick On Wed, Nov 3, 2010 at 1:13 AM, Qian Ye wrote: > thanks Patrick, I want to know all watches set by all clients. > I would open a jira and write some design think about it later. > > On Tue, Nov 2, 2010 at 11:53 PM, Patrick Hunt wrote: > >> Hi Qian Ye,

Re: question about watcher

2010-11-02 Thread Patrick Hunt
Hi Qian Ye, yes you should open a JIRA for this. If you want to work on a patch we could advise you. One thing not clear to me, are you interested in just the watches set by the particular client, or all watches set by all clients? The first should be relatively easy to get, the second would be mor

Re: Setting the heap size

2010-11-01 Thread Patrick Hunt
> > On Thu, Oct 28, 2010 at 6:13 PM, Patrick Hunt wrote: >> Tim, one other thing you might want to be aware of: >> http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_supervision >> >> Patrick >> >> On Thu, Oct 28, 2010 at 9:11 AM, Pat

Re: Setting the heap size

2010-11-01 Thread Patrick Hunt
> > On Thu, Oct 28, 2010 at 6:13 PM, Patrick Hunt wrote: >> Tim, one other thing you might want to be aware of: >> http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_supervision >> >> Patrick >> >> On Thu, Oct 28, 2010 at 9:11 AM, Pat

Re: Getting a "node exists" code on a sequence create

2010-11-01 Thread Patrick Hunt
Hi Jeremy, this sounds like a bug to me, I don't think you should be getting the nodeexists when the sequence flag is set. Looking at the code briefly we use the parent's "cversion" (incremented each time the child list is changed, added/removed). Did you see this error each time you called creat

Re: Setting the heap size

2010-10-28 Thread Patrick Hunt
Tim, one other thing you might want to be aware of: http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_supervision Patrick On Thu, Oct 28, 2010 at 9:11 AM, Patrick Hunt wrote: > On Thu, Oct 28, 2010 at 2:52 AM, Tim Robertson > wrote: >> We are setting up a sma

Re: Setting the heap size

2010-10-28 Thread Patrick Hunt
On Thu, Oct 28, 2010 at 2:52 AM, Tim Robertson wrote: > We are setting up a small Hadoop 13 node cluster running 1 HDFS > master, 9 region severs for HBase and 3 map reduce nodes, and are just > installing zookeeper to perform the HBase coordination and to manage a > few simple process locks for o

Re: Stale value for read request

2010-10-25 Thread Patrick Hunt
On Sat, Oct 23, 2010 at 9:03 PM, jingguo yao wrote: > Read requests are handled locally at each Zookeeper server. So it is > possible for a read request to return a stale value even though a more > recent update to the same znode has been committed. Does this statement > still hold if the Zookeep

Re: Reading znodes directly from snapshot and log files

2010-10-25 Thread Patrick Hunt
Sounds like a useful utility, the closest that I know of is this: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/server/LogFormatter.html but it just dumps the txn log. Seems like it would be cool to be able to open a "shell" on the datadir and query it (separate from runn

Re: Retrying sequential znode creation

2010-10-25 Thread Patrick Hunt
r, given they would still have to code for this corner case. Patrick > On Wed, Oct 20, 2010 at 10:42 AM, Patrick Hunt wrote: > > > Hi Ted, Mahadev is in the best position to comment (he looked at it last) > > but iirc when we started looking into implementing this we immediately

Re: zxid integer overflow

2010-10-20 Thread Patrick Hunt
I'm not aware of sustained 1k/sec, Ben might know how long the 20k/sec test runs for (and for how long that rate is sustained). You'd definitely want to tune the GC, GC related pauses would be the biggest obstacle for this (assuming you are using a dedicated log device for the transaction logs). P

Re: Digest user ACL check failing

2010-10-20 Thread Patrick Hunt
Sounds like it might be a bug, was this just for the root or for any znode? Please file a JIRA, thanks. Patrick On Tue, Oct 19, 2010 at 1:01 PM, Fournier, Camille F. [Tech] < camille.fourn...@gs.com> wrote: > The ZK documentation says: > New in 3.2: Enables a ZooKeeper ensemble administrator to

Re: Retrying sequential znode creation

2010-10-20 Thread Patrick Hunt
, it didn't sound like > it was going to be that hard. > > On Wed, Oct 13, 2010 at 12:08 PM, Patrick Hunt wrote: > > > 22 would help with this issue > > https://issues.apache.org/jira/browse/ZOOKEEPER-22 > > however there are some real hurdles to implementing 22 successfully. > > >

Re: Unusual exception

2010-10-20 Thread Patrick Hunt
EOS means that the client closed the connection (from the point of view of the server). The server then tries to cleanup by closing the socket explicitly, in some cases that results in debug messages you see subsequent. EndOfStreamException: Unable to read additional data from client sessionid 0x0

Re: Retrying sequential znode creation

2010-10-20 Thread Patrick Hunt
, it didn't sound like > it was going to be that hard. > > On Wed, Oct 13, 2010 at 12:08 PM, Patrick Hunt wrote: > > > 22 would help with this issue > > https://issues.apache.org/jira/browse/ZOOKEEPER-22 > > however there are some real hurdles to implementing 22 successfully. > > >

Re: Testing zookeeper outside the source distribution?

2010-10-18 Thread Patrick Hunt
You might checkout a tool I built a while back to be used by operations teams deploying ZooKeeper: http://bit.ly/a6tGVJ It's really two tools actually, a smoketester and a latency tester, both of which are important to verify when deploying a new cluster. Patrick On Mon, Oct 18, 2010 at 9:50 AM,

Re: Testing zookeeper outside the source distribution?

2010-10-18 Thread Patrick Hunt
You might checkout a tool I built a while back to be used by operations teams deploying ZooKeeper: http://bit.ly/a6tGVJ It's really two tools actually, a smoketester and a latency tester, both of which are important to verify when deploying a new cluster. Patrick On Mon, Oct 18, 2010 at 9:50 AM,

Re: Retrying sequential znode creation

2010-10-13 Thread Patrick Hunt
On Wed, Oct 13, 2010 at 5:58 AM, Vishal K wrote: > > However, gets trickier because there is no explicit way (to my knowledge) > to > get CreateMode for a znode. As a result, we cannot tell whether a node is > sequential or not. > Sequentials are really just regular znodes with fancy naming appli

Re: Membership using ZK

2010-10-13 Thread Patrick Hunt
FYI: http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting On Tue, Oct 12, 2010 at 2:23 PM, Benjamin Reed wrote: > yes, your watcher objects will get the connectionloss event and eventually > the session expired event. > > ben > > > On 10/12/2010 10:57 AM, Avinash Lakshman wrote: > >> Would m

Re: What does this mean?

2010-10-13 Thread Patrick Hunt
On Mon, Oct 11, 2010 at 4:16 PM, Avinash Lakshman < avinash.laksh...@gmail.com> wrote: > tickTime = 2000, initLimit = 3000 and the data is around 11GB this is log + > snapshot. So if I need to add a new observer can I transfer state from the > ensemble manually before starting it? If so which file

Re: znode inconsistencies across ZooKeeper servers

2010-10-07 Thread Patrick Hunt
m properly. > cZxid = 0x105ef > ctime = Tue Oct 05 15:00:50 UTC 2010 > mZxid = 0x105ef > mtime = Tue Oct 05 15:00:50 UTC 2010 > pZxid = 0x105ef > cversion = 0 > dataVersion = 0 > aclVersion = 0 > ephemeralOwner = 0x2b7ce57ce4 > dataLength = 54 > numChildre

Re: snapshots

2010-10-07 Thread Patrick Hunt
Simplified: when a server comes back up it checks it's local snaps/logs to reconstruct as much of the "current state" as possible. It then checks with the leader to see how far behind it is, at which point it either gets a diff or gets a full snapshot (from the leader) depending on how far behind i

Re: Changing configuration

2010-10-07 Thread Patrick Hunt
You probably want to do a "rolling restart", this is preferable over "restart the cluster" as the service will not go down. http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A6 Patrick On Wed, Oct 6, 2010 at 9:49 PM, Avinash Lakshman wrote: > Suppose I

Re: znode inconsistencies across ZooKeeper servers

2010-10-06 Thread Patrick Hunt
Vishal the attachment seems to be getting removed by the list daemon (I don't have it), can you create a JIRA and attach? Also this is a good question for the ppl on zookeeper-user. (ccing) You are aware that ephemeral znodes are tied to the session? And that sessions only expire after the session

Re: Too many connections

2010-10-06 Thread Patrick Hunt
On Tue, Oct 5, 2010 at 10:23 AM, Avinash Lakshman < avinash.laksh...@gmail.com> wrote: > So shouldn't all servers in another DC just have one session? So even if I > have 50 observers in another DC that should be 50 sessions established > since > the IP doesn't change correct? Am I missing somethi

Re: Zookeeper on 60+Gb mem

2010-10-05 Thread Patrick Hunt
Tuning GC is going to be critical, otw all the sessions will timeout (and potentially expire) during GC pauses. Patrick On Tue, Oct 5, 2010 at 1:18 PM, Maarten Koopmans wrote: > Yes, and syncing after a crash will be interesting as well. Off note; I am > running it with a 6GB heap now, but it's

Re: Too many connections

2010-10-05 Thread Patrick Hunt
n is. > > Cheers > Avinash > > On Tue, Oct 5, 2010 at 9:27 AM, Patrick Hunt wrote: > > > See this configuration param in the docs "maxClientCnxns": > > > > > http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_a

Re: Too many connections

2010-10-05 Thread Patrick Hunt
See this configuration param in the docs "maxClientCnxns": http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_advancedConfiguration Patrick On Tue, Oct 5, 2010 at 8:10 AM, Avinash La

Re: ZK compatability

2010-09-30 Thread Patrick Hunt
gt; What about major releases going forward? Thanks, > > Jun > > On Mon, Sep 27, 2010 at 10:32 PM, Patrick Hunt wrote: > > > In general yes, minor and bug fix releases are fully backward compatible. > > > > Patrick > > > > > > On Sun, Sep 26, 2010

Re: Question about rest interface

2010-09-30 Thread Patrick Hunt
Hi Marc, you should checkout the REST interface that's on the svn trunk, it includes new functionality and numerous fixes that might be interesting to you, this will be part of 3.4.0. CCing Andrei who worked on this as part of his GSOC project this summer. If you look at this file: src/contrib/res

Re: c client "0" state?

2010-09-27 Thread Patrick Hunt
Seems like a bug to me. Please enter a JIRA (if you haven't already). Thanks, Patrick On Fri, Sep 17, 2010 at 9:10 AM, Michael Xu wrote: > Hi everyone > > in the c client api: > > Is it normal for zoo_state() to return zero (not one of the valid state > consts) when it is handling socket error

Re: processResults

2010-09-27 Thread Patrick Hunt
I believe what the author is trying to say is that if the getdata were to fail (such as the example you give) the watch set as part of the original call will fire, and this will notify the client that the node was deleted. (call to process(event)) Patrick On Mon, Sep 27, 2010 at 6:56 PM, Milind P

Re: ZK compatability

2010-09-27 Thread Patrick Hunt
In general yes, minor and bug fix releases are fully backward compatible. Patrick On Sun, Sep 26, 2010 at 9:11 PM, Jun Rao wrote: > Hi, > > Does ZK support (and plan to support in the future) backward compatibility > (so that a new client can talk to an old server and vice versa)? > > Thanks >

Re: zkfuse

2010-09-27 Thread Patrick Hunt
Sounds like you have an old version of autoconf, try upgrading, see similar issue here: http://www.mail-archive.com/thrift-u...@incubator.apache.org/msg00673.html Patrick 2010/9/24 俊贤 > Hi mahadev, > > My os is Linux l

Re: possible bug in zookeeper ?

2010-09-16 Thread Patrick Hunt
PeerConfig: > > "...class SolrZkServerProps extends QuorumPeerConfig {" > > And because > SolrZkServerProps reference the clientPort field in its super class - > > it cant compile once you change the jar and eliminate this field... > > > yatir >

Re: possible bug in zookeeper ?

2010-09-16 Thread Patrick Hunt
PeerConfig: > > "...class SolrZkServerProps extends QuorumPeerConfig {" > > And because > SolrZkServerProps reference the clientPort field in its super class - > > it cant compile once you change the jar and eliminate this field... > > > yatir >

Re: possible bug in zookeeper ?

2010-09-15 Thread Patrick Hunt
On Wed, Sep 15, 2010 at 12:56 AM, Yatir Ben Shlomo wrote: > 2. Unfortunately I have already tried to switch to the new jar but it does > not seem to be backward compatible. > It seems that the QuorumPeerConfig class does not have the following field > protected int clientPort; > It was replaced by

Re: possible bug in zookeeper ?

2010-09-14 Thread Patrick Hunt
That is unusual. I don't recall anyone reporting a similar issue, and looking at the code I don't see any issues off hand. Can you try the following? 1) on that particular zk client machine resolve the hosts zook1/zook2/zook3, what ip addresses does this resolve to? (try dig) 2) try running the cl

Re: Spew after call to close

2010-09-08 Thread Patrick Hunt
No worries, let us know if something else pops up. Patrick On Tue, Sep 7, 2010 at 3:10 PM, Stack wrote: > Nevermind. I figured it. It was an hbase issue. We were leaking a > client reference. > Sorry for the noise, > St.Ack > > > On Sat, Sep 4, 2010 at 10:58 AM, Stack wrote: > > Thats right

Re: Spew after call to close

2010-09-08 Thread Patrick Hunt
No worries, let us know if something else pops up. Patrick On Tue, Sep 7, 2010 at 3:10 PM, Stack wrote: > Nevermind. I figured it. It was an hbase issue. We were leaking a > client reference. > Sorry for the noise, > St.Ack > > > On Sat, Sep 4, 2010 at 10:58 AM, Stack wrote: > > Thats right

Re: closing session on socket close vs waiting for timeout

2010-09-06 Thread Patrick Hunt
or expect. > > ben > > > On 09/01/2010 12:47 PM, Patrick Hunt wrote: > >> Ben, in this case the session would be tied directly to the connection, >> we'd explicitly deny session re-establishment for this session type (so >> 4 would fail). Would that address yo

Re: election recipe

2010-09-06 Thread Patrick Hunt
Hi Andrei, the answer may not be as simple as that. In the case of "passive leader" you might want to just wait till you're reconnected before taking any action. Connection loss indicates that you aren't currently connected to a server, it doesn't mean that you've lost leadership (if you get expire

Re: getting created child on NodeChildrenChanged event

2010-09-06 Thread Patrick Hunt
It is good to keep things simple, but we have seen some requests related to the client api for children use cases that seem reasonable. In particular the issue of handling large numbers of children efficiently is currently a problem (queue say). We've seen proposals on this before, just no one's f

Re: maximum myid?

2010-09-03 Thread Patrick Hunt
The server id (myid) should be a number btw 1 and 255. See item 5 here: http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_zkMulitServerSetup The lower 8 bits of the myid is used as the upper 8 bits in session id generation, you might end up with duplicate session ids if you u

Re: closing session on socket close vs waiting for timeout

2010-09-01 Thread Patrick Hunt
lishment and always drop back to full session recreation in cases of network failure. TANSTAAFL. :-) But unless I'm mis-understanding your original request this solves your problem as originally stated. Patrick C -----Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent:

Re: closing session on socket close vs waiting for timeout

2010-09-01 Thread Patrick Hunt
Ben, in this case the session would be tied directly to the connection, we'd explicitly deny session re-establishment for this session type (so 4 would fail). Would that address your concern, others? Patrick On 09/01/2010 10:03 AM, Benjamin Reed wrote: i'm a bit skeptical that this is going t

Re: C++ client wrapper

2010-08-31 Thread Patrick Hunt
Check the zkfuse module in contrib. That's the only C++ wrapper I remember. Patrick On 08/31/2010 03:32 PM, Goran Zuzic wrote: Hi! I'm looking for a good zookeeper C++ client wrapper (something like Java) for educational purposes. Is there anything similar available? Thanks, Goran

Re: Zookeeper shell

2010-08-31 Thread Patrick Hunt
Depending on your classpath setup: java org.apache.zookeeper.ZooKeeperMain -server 127.0.0.1:2181 if jline jar is in your classpath (included in the zk release distribution) you'll get history, auto-complete and such. Patrick On 08/31/2010 03:08 PM, Michi Mutsuzaki wrote: Hello, I'm lookin

Re: closing session on socket close vs waiting for timeout

2010-08-31 Thread Patrick Hunt
That's what I read too... Sounds like having a session type that's bound to the connection lifetime would be useful. We might want to have an option to turn off ZK heartbeating and use tcp keep alive for those session types. Patrick On Tue, Aug 31, 2010 at 10:14 AM, Dave Wright wrote: > I thin

Re: Logs and in memory operations

2010-08-31 Thread Patrick Hunt
On Mon, Aug 30, 2010 at 1:11 PM, Avinash Lakshman < avinash.laksh...@gmail.com> wrote: > From my understanding when a znode is updated/created a write happens into > the local transaction logs and then some in-memory data structure is > updated > to serve the future reads. > Where in the source co

Re: closing session on socket close vs waiting for timeout

2010-08-31 Thread Patrick Hunt
That's what I read too... Sounds like having a session type that's bound to the connection lifetime would be useful. We might want to have an option to turn off ZK heartbeating and use tcp keep alive for those session types. Patrick On Tue, Aug 31, 2010 at 10:14 AM, Dave Wright wrote: > I thin

Re: Exception causing close of session

2010-08-30 Thread Patrick Hunt
> > On Thu, Aug 26, 2010 at 5:05 PM, Patrick Hunt wrote: > > > > Client has seen zxid 0xfa4 our last zxid is 0x42 > > > > Someone reset the zk server database without restarting the clients. As a > > result the client is "forward" in time relative t

Re: Receiving create events for self with synchronous create

2010-08-30 Thread Patrick Hunt
On line 64 are you ensuring that the ZooKeeper session is active before executing that sequence? zookeeper = new ZooKeeper(...) is async - it returns before you're actually connected to the server (you get notified of this in your watcher). If you execute this sequence quickly enough your zk.creat

Re: Zookeeper stops

2010-08-30 Thread Patrick Hunt
ry outside of /tmp for zookeeper persistence ? > > Thanks > > On Thu, Aug 19, 2010 at 1:42 PM, Patrick Hunt wrote: > > > No. You configure it in the server configuration file. > > > > Patrick > > > > > > On 08/19/2010 01:19 PM, Wim Jongman wrote:

Re: IllegalArgumentException excpetion : Path cannot be null

2010-08-30 Thread Patrick Hunt
The client (solr in this case) is passing a null path to the ZooKeeper.getChildren(path, ... ) call. java.lang.IllegalArgumentException: Path cannot be null at org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:45) at org.apache.zookeeper.ZooKeeper.getChildren(zookeepe

Re: Integration/Unit testing with Zookeeper

2010-08-30 Thread Patrick Hunt
I don't think we have enough context to help you out with this. Where are you creating/closing your ZooKeeper clients? I'd suggest that you turn on DEBUG level logging in your test and look at the output. Verify that you are actually closing the zookeeper client as you expect. Patrick On Wed, Au

Re: Closing a client fails

2010-08-30 Thread Patrick Hunt
Typically when you call "close" on the ZooKeeper object it will shut down both the threads (one is the "send" thread, the other is the "event" thread). Are you sure you're calling the close method? Perhaps you've found this issue? https://issues.apache.org/jira/browse/ZOOKEEPER-795 Patrick On Tu

Re: Weird ephemeral node issue

2010-08-30 Thread Patrick Hunt
Rather than the wiki would be great to get this into the docs. Would you mind creating a JIRA? https://issues.apache.org/jira/browse/ZOOKEEPER Thanks, Patrick On Tue, Aug 17, 2010 at 8:29 PM, Qing Yan wrote: > Thanks for the explaination! I sugg

Re: Exception causing close of session

2010-08-26 Thread Patrick Hunt
> Client has seen zxid 0xfa4 our last zxid is 0x42 Someone reset the zk server database without restarting the clients. As a result the client is "forward" in time relative to the cluster. Patrick On 08/26/2010 04:03 PM, Ted Yu wrote: Hi, zookeeper-3.2.2 is used out of HBase 0.20.5 Linux sj

Re: ZK monitoring

2010-08-19 Thread Patrick Hunt
Maybe we should have a contrib pkg for utilities such as this? I could see a python script that, given 1 server (might require addl 4letter words but this would be useful regardless), could collect such information from the cluster. Create a JIRA? Patrick On 08/17/2010 12:14 PM, Andrei Savu w

Re: Zookeeper stops

2010-08-19 Thread Patrick Hunt
No. You configure it in the server configuration file. Patrick On 08/19/2010 01:19 PM, Wim Jongman wrote: Hi, But zk does default to /tmp? Regards, Wim On Thursday, August 19, 2010, Patrick Hunt wrote: +1 on that Ted. I frequently see this issue crop up as "I just rebooted my s

Re: Zookeeper stops

2010-08-19 Thread Patrick Hunt
+1 on that Ted. I frequently see this issue crop up as "I just rebooted my server and lost all my data ..." -- many os's will cleanup tmp on reboot. :-) Patrick On 08/19/2010 07:43 AM, Ted Dunning wrote: Also, /tmp is not a great place to keep things that are intended for persistence. On Thu

Re: Session expiration caused by time change

2010-08-18 Thread Patrick Hunt
Do you expect the time to be "wrong" frequently? If ntp is running it should never get out of sync more than a small amount. As long as this is less than ~your timeout you should be fine. Patrick On 08/18/2010 01:04 AM, Qing Yan wrote: Hi, The testcase is fairly simple. We have a client

Re: A question about Watcher

2010-08-17 Thread Patrick Hunt
All servers keep a copy - so you can shutdown the zk service entirely (all servers) and restart it and the sessions are maintained. Patrick On 08/16/2010 06:34 PM, Qian Ye wrote: Thx Mahadev and Benjamin, it seems that I've got some misunderstanding about the client. I will check it out. Anot

Re: client failure detectionin ZK

2010-08-17 Thread Patrick Hunt
used. Patrick On 08/17/2010 08:51 AM, Jun Rao wrote: Thanks. Also, suppose that I know the average network latency, what's the rule of thumb to set the value of session timeout? Jun On Mon, Aug 16, 2010 at 1:55 PM, Patrick Hunt mailto:ph...@apache.org>> wrote: The sessio

Re: client failure detectionin ZK

2010-08-16 Thread Patrick Hunt
The session timeout is used for this: http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions Patrick On 08/16/2010 01:47 PM, Jun Rao wrote: Hi, What config parameters in ZK determine how soon a failed client is detected? Thanks, Jun

Re: How to handle "Node does not exist" error?

2010-08-16 Thread Patrick Hunt
Try using the logs, stat command or JMX to verify that each ZK server is indeed a leader/follower as expected. You should have one leader and n-1 followers. Verify that you don't have any "standalone" servers (this is the most frequent error I see - misconfiguration of a server such that it thi

Re: zookeeper seems to hang

2010-08-12 Thread Patrick Hunt
Great bug report Ted, the stack trace in particular is very useful. It looks like a timing bug where the client is not shutting down cleanly on the close call. I reviewed the code in question but nothing pops out at me. Also the logs just show us shutting down, nothing else from zk in there.

Re: Backing up zk data files

2010-08-12 Thread Patrick Hunt
On 08/11/2010 06:49 PM, Adam Rosien wrote: http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperAdmin.html#sc_dataFileManagement says that one can copy the contents of the data directory and use it on another machine. The example states the other instance is not in the server list; what would

Re: Clarification on async calls in a cluster

2010-08-11 Thread Patrick Hunt
On 08/11/2010 03:25 PM, Jordan Zimmerman wrote: If I use an async version of a call in a cluster ("ensemble") what happens if the server I'm connected to goes down? Does ZK transparently resubmit the call to the next server in the cluster and call my async callback or is there something I need t

Re: Sequence Number Generation With Zookeeper

2010-08-10 Thread Patrick Hunt
Great! Basic details are here (create a jira, attach a patch, click "submit" and someone will review and help you get it into a state which we can commit). Probably you'd put your code into src/recipes or src/contrib (recipes sounds reasonable). http://wiki.apache.org/hadoop/ZooKeeper/HowToCo

Re: Too many "KeeperErrorCode = Session moved" messages

2010-08-07 Thread Patrick Hunt
I suspect this is a bug with the sync call and session moved (the code path for sync is a bit special). Please enter a JIRA for this. Thanks. Patrick On 08/05/2010 01:20 PM, Vishal K wrote: Hi All, I am seeing a lot of these messages in our application. I would like to know if I am doing some

Re: Using watcher for being notified of children addition/removal

2010-08-02 Thread Patrick Hunt
You may want to consider adding a distributed queue to your use of ZK. As was mentioned previously, watches don't notify you of every change, just that a change was made. For example multiple changes may be "visible" when you get the notification. A distributed queue would allow you to "log" e

Re: ZK recovery questions

2010-07-20 Thread Patrick Hunt
Not having a datadir is currently not possible - the servers expect to read/write snapshot and log files. In particular the leader needs to be able to stream updates, and in some cases the entire latest snapshot, to a follower. It does this by streaming data directly from the filesystem. Patri

Re: JMX error while starting ZooKeeper

2010-07-19 Thread Patrick Hunt
On 07/19/2010 05:04 PM, Rakesh Aggarwal wrote: javax.management.MBeanServer; was not found Sounds like you are missing rt.jar for some reason (contains that class). Try running "java -verbose -version" and see what jars are being picked up, I see a number of lines containing: ... /usr/lib

Re: Errors with Python bindings

2010-07-16 Thread Patrick Hunt
Hi Rich, the version string looks useful to have, thanks! Would you mind submitting this via jira? Do a "svn diff" (looks like you did already), create a jira and attach the diff, then click "submit" link on the jira. We'll review and work on getting it into a future release. http://wiki.apache

Re: total # of zknodes

2010-07-15 Thread Patrick Hunt
I've done some tests with ~600 clients creating 5 million znodes (size 100bytes iirc) and 25million watches. I was using 8gb of memory for this, however --- in this scenario it's critical that you tune the GC, in particular you need to turn on CMS and incremental GC options. Otw when the GC col

Re: Suggested way to simulate client session expiration in unit tests?

2010-07-06 Thread Patrick Hunt
If you want to simulate expiration use the example I sent. http://github.com/phunt/zkexamples Another option is to use a mock. Patrick On 07/06/2010 05:42 PM, Jeremy Davis wrote: Thanks! That seems to work, but it is approximately the same as zooKeeper.close() in that there is no SessionExp

Re: Suggested way to simulate client session expiration in unit tests?

2010-07-06 Thread Patrick Hunt
not sure if this still works but here's an example: http://github.com/phunt/zkexamples Patrick On 07/06/2010 10:32 AM, Mahadev Konar wrote: Hi Jeremy, zk.disconnect() is the right way to disconnect from the servers. For session expiration you just have to make sure that the client stays dis

Re: Zookeeper outage recap & questions

2010-07-01 Thread Patrick Hunt
Hi Travis, as Flavio suggested would be great to get the logs. A few questions: 1) how did you eventually recover, restart the zk servers? 2) was the cluster losing quorum during this time? leader re-election? 3) Any chance this could have been initially triggered by a long GC pause on one of

Re: Guaranteed message delivery until session timeout?

2010-06-30 Thread Patrick Hunt
On 06/30/2010 09:37 AM, Ted Dunning wrote: Which API are you talking about? C? I think that the difference between connection loss and session expiration might mess you up slightly in your disjunction here. On Wed, Jun 30, 2010 at 7:45 AM, Bryan Thompson wrote: I am wondering what guarantee

Re: Receive timed out error while starting zookeeper server

2010-06-27 Thread Patrick Hunt
On 06/26/2010 06:53 AM, Peeyush Kumar wrote: I have a 6 node cluster (5 slaves and 1 master). I am trying to You typically want an odd number given that zk works by majority (even is fine, but not optimal). So 5 would be great (7 is a bit of overkill). 3 is fine too, but 5 allows fo

Re: Watchers & error handling

2010-06-25 Thread Patrick Hunt
doop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html#ch_zkWatches>Alexis This (3.1.1) is a pretty old version of the docs, I'd suggest that you look at the most recent before entering JIRAs: http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkWatches Regar

Re: Watchers & error handling

2010-06-25 Thread Patrick Hunt
On 06/12/2010 10:07 PM, Alexis Midon wrote: I implemented queues and locks on top of ZooKeeper, and I'm pretty happy so far. Thanks for the nice work. Tests look good. So good that we can focus on exception/error handling and I got a couple of questions. #1. Regarding the use of the default wa

Re: What does ZOO_ASSOCIATING_STATE in the C client mean?

2010-06-25 Thread Patrick Hunt
On 06/24/2010 08:32 PM, Ying-Yi Liang wrote: I noticed there are a few inconsistence of session state values between the C client and the Java client. I tried to search for an explanation of the ZOO_ASSOCIATING_STATE (=2) declared in src/c/include/zookeeper.h in the source comments and on the we

Re: What's the problem with nio on FreeBSD?

2010-06-22 Thread Patrick Hunt
I believe this was the issue we found originally, however in general java didn't seem to work well (generally unsupported, at least at the time) with freebsd. http://java-programmer.itgroups.info/1/10/thread-267393.html Patrick On 06/22/2010 03:37 AM, Alexander E. Patrakov wrote: Hello. On

Re: 答复: Starting zookeeper in replicat ed mode

2010-06-22 Thread Patrick Hunt
There are 3 ports that need to be opened 1) the client port (btw client and servers) 2/3) the quorum and election ports - only btw servers You are setting these three ports in your config file (clientport defaults to 2181 iirc, unless you override) Patrick On 06/22/2010 06:17 AM, Erik Test w

Re: Free Software Solution to continuously load a large number of feeds with several servers?

2010-06-18 Thread Patrick Hunt
I've seen a number of these built as proprietary solutions using ZooKeeper. It would be great to see something open sourced. HBase/ZK seems like a good fit. You might also consider ZooKeeper/BookKeeper. Patrick On 06/18/2010 11:01 AM, Thomas Koch wrote: http://stackoverflow.com/questions/3072

  1   2   3   4   5   >