Session timeout is 30 seconds. On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <ph...@apache.org> wrote:
> What is your client timeout? It may be too low. > > also see this section on handling recoverable errors: > http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling > > connection loss in particular needs special care since: > "When a ZooKeeper client loses a connection to the ZooKeeper server there > may be some requests in flight; we don't know where they were in their > flight at the time of the connection loss. " > > Patrick > > > Satish Bhatti wrote: > >> I have recently started running on EC2 and am seeing quite a few >> ConnectionLoss exceptions. Should I just catch these and retry? Since I >> assume that eventually, if the shit truly hits the fan, I will get a >> SessionExpired? >> Satish >> >> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <ted.dunn...@gmail.com> >> wrote: >> >> We have used EC2 quite a bit for ZK. >>> >>> The basic lessons that I have learned include: >>> >>> a) EC2's biggest advantage after scaling and elasticity was conformity of >>> configuration. Since you are bringing machines up and down all the time, >>> they begin to act more like programs and you wind up with boot scripts >>> that >>> give you a very predictable environment. Nice. >>> >>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN. >>> That >>> can make the ZK servers appear a bit less connected. You have to plan >>> for >>> ConnectionLoss events. >>> >>> c) for highest reliability, I switched to large instances. On >>> reflection, >>> I >>> think that was helpful, but less important than I thought at the time. >>> >>> d) increasing and decreasing cluster size is nearly painless and is >>> easily >>> scriptable. To decrease, do a rolling update on the survivors to update >>> their configuration. Then take down the instance you want to lose. To >>> increase, do a rolling update starting with the new instances to update >>> the >>> configuration to include all of the machines. The rolling update should >>> bounce each ZK with several seconds between each bounce. Rescaling the >>> cluster takes less than a minute which makes it comparable to EC2 >>> instance >>> boot time (about 30 seconds for the Alestic ubuntu instance that we used >>> plus about 20 seconds for additional configuration). >>> >>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <david.g...@28msec.com> >>> wrote: >>> >>> Hello >>>> >>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my >>>> >>> system, >>> >>>> zookeeper is used to run a locking service and to generate unique id's. >>>> Currently, for testing purposes, I am only running one instance. Now, I >>>> >>> need >>> >>>> to set up an ensemble to protect my system against crashes. >>>> The ec2 services has some differences to a normal server farm. E.g. the >>>> data saved on the file system of an ec2 instance is lost if the instance >>>> crashes. In the documentation of zookeeper, I have read that zookeeper >>>> >>> saves >>> >>>> snapshots of the in-memory data in the file system. Is that needed for >>>> recovery? Logically, it would be much easier for me if this is not the >>>> >>> case. >>> >>>> Additionally, ec2 brings the advantage that serves can be switch on and >>>> >>> off >>> >>>> dynamically dependent on the load, traffic, etc. Can this advantage be >>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper >>>> >>> server >>> >>>> dynamically to an ensemble? E.g. dependent on the in-memory load? >>>> >>>> David >>>> >>>> >>