Well a bunch of the ConnectionLosses were for zookeeper.exists() calls. I'm pretty sure dumb retry for those should suffice!
On Tue, Sep 1, 2009 at 4:31 PM, Mahadev Konar <maha...@yahoo-inc.com> wrote: > Hi Satish, > > Connectionloss is a little trickier than just retrying blindly. Please > read the following sections on this - > > http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling > > And the programmers guide: > > http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html > > To learn more about how to handle CONNECTIONLOSS. The idea is that that > blindly retrying would create problems with CONNECTIONLOSS, since a > CONNECTIONLOSS does NOT necessarily mean that the zookepeer operation that > you were executing failed to execute. It might be possible that this > operation went through the servers. > > Since, this has been a constant source of confusion for everyone who starts > using zookeeper we are working on a fix ZOOKEEPER-22 which will take care > of > this problem and programmers would not have to worry about CONNECTIONLOSS > handling. > > Thanks > mahadev > > > > > On 9/1/09 4:13 PM, "Satish Bhatti" <cthd2...@gmail.com> wrote: > > > I have recently started running on EC2 and am seeing quite a few > > ConnectionLoss exceptions. Should I just catch these and retry? Since I > > assume that eventually, if the shit truly hits the fan, I will get a > > SessionExpired? > > Satish > > > > On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > >> We have used EC2 quite a bit for ZK. > >> > >> The basic lessons that I have learned include: > >> > >> a) EC2's biggest advantage after scaling and elasticity was conformity > of > >> configuration. Since you are bringing machines up and down all the > time, > >> they begin to act more like programs and you wind up with boot scripts > that > >> give you a very predictable environment. Nice. > >> > >> b) EC2 interconnect has a lot more going on than in a dedicated VLAN. > That > >> can make the ZK servers appear a bit less connected. You have to plan > for > >> ConnectionLoss events. > >> > >> c) for highest reliability, I switched to large instances. On > reflection, > >> I > >> think that was helpful, but less important than I thought at the time. > >> > >> d) increasing and decreasing cluster size is nearly painless and is > easily > >> scriptable. To decrease, do a rolling update on the survivors to update > >> their configuration. Then take down the instance you want to lose. To > >> increase, do a rolling update starting with the new instances to update > the > >> configuration to include all of the machines. The rolling update should > >> bounce each ZK with several seconds between each bounce. Rescaling the > >> cluster takes less than a minute which makes it comparable to EC2 > instance > >> boot time (about 30 seconds for the Alestic ubuntu instance that we used > >> plus about 20 seconds for additional configuration). > >> > >> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <david.g...@28msec.com> > wrote: > >> > >>> Hello > >>> > >>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my > >> system, > >>> zookeeper is used to run a locking service and to generate unique id's. > >>> Currently, for testing purposes, I am only running one instance. Now, I > >> need > >>> to set up an ensemble to protect my system against crashes. > >>> The ec2 services has some differences to a normal server farm. E.g. the > >>> data saved on the file system of an ec2 instance is lost if the > instance > >>> crashes. In the documentation of zookeeper, I have read that zookeeper > >> saves > >>> snapshots of the in-memory data in the file system. Is that needed for > >>> recovery? Logically, it would be much easier for me if this is not the > >> case. > >>> Additionally, ec2 brings the advantage that serves can be switch on and > >> off > >>> dynamically dependent on the load, traffic, etc. Can this advantage be > >>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper > >> server > >>> dynamically to an ensemble? E.g. dependent on the in-memory load? > >>> > >>> David > >>> > >> > >