I just checked the JMX console. AvgRequestLatency 38 MaxRequestLatency 55767
I assume those units are milliseconds? On Tue, Sep 1, 2009 at 5:05 PM, Patrick Hunt <ph...@apache.org> wrote: > Yes. create/set/delete/... are really the issue (non-idempotent). > > > Satish Bhatti wrote: > >> Well a bunch of the ConnectionLosses were for zookeeper.exists() calls. >> I'm >> pretty sure dumb retry for those should suffice! >> >> On Tue, Sep 1, 2009 at 4:31 PM, Mahadev Konar <maha...@yahoo-inc.com> >> wrote: >> >> Hi Satish, >>> >>> Connectionloss is a little trickier than just retrying blindly. Please >>> read the following sections on this - >>> >>> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling >>> >>> And the programmers guide: >>> >>> http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html >>> >>> To learn more about how to handle CONNECTIONLOSS. The idea is that that >>> blindly retrying would create problems with CONNECTIONLOSS, since a >>> CONNECTIONLOSS does NOT necessarily mean that the zookepeer operation >>> that >>> you were executing failed to execute. It might be possible that this >>> operation went through the servers. >>> >>> Since, this has been a constant source of confusion for everyone who >>> starts >>> using zookeeper we are working on a fix ZOOKEEPER-22 which will take care >>> of >>> this problem and programmers would not have to worry about CONNECTIONLOSS >>> handling. >>> >>> Thanks >>> mahadev >>> >>> >>> >>> >>> On 9/1/09 4:13 PM, "Satish Bhatti" <cthd2...@gmail.com> wrote: >>> >>> I have recently started running on EC2 and am seeing quite a few >>>> ConnectionLoss exceptions. Should I just catch these and retry? Since >>>> I >>>> assume that eventually, if the shit truly hits the fan, I will get a >>>> SessionExpired? >>>> Satish >>>> >>>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <ted.dunn...@gmail.com> >>>> >>> wrote: >>> >>>> We have used EC2 quite a bit for ZK. >>>>> >>>>> The basic lessons that I have learned include: >>>>> >>>>> a) EC2's biggest advantage after scaling and elasticity was conformity >>>>> >>>> of >>> >>>> configuration. Since you are bringing machines up and down all the >>>>> >>>> time, >>> >>>> they begin to act more like programs and you wind up with boot scripts >>>>> >>>> that >>> >>>> give you a very predictable environment. Nice. >>>>> >>>>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN. >>>>> >>>> That >>> >>>> can make the ZK servers appear a bit less connected. You have to plan >>>>> >>>> for >>> >>>> ConnectionLoss events. >>>>> >>>>> c) for highest reliability, I switched to large instances. On >>>>> >>>> reflection, >>> >>>> I >>>>> think that was helpful, but less important than I thought at the time. >>>>> >>>>> d) increasing and decreasing cluster size is nearly painless and is >>>>> >>>> easily >>> >>>> scriptable. To decrease, do a rolling update on the survivors to update >>>>> their configuration. Then take down the instance you want to lose. To >>>>> increase, do a rolling update starting with the new instances to update >>>>> >>>> the >>> >>>> configuration to include all of the machines. The rolling update should >>>>> bounce each ZK with several seconds between each bounce. Rescaling the >>>>> cluster takes less than a minute which makes it comparable to EC2 >>>>> >>>> instance >>> >>>> boot time (about 30 seconds for the Alestic ubuntu instance that we used >>>>> plus about 20 seconds for additional configuration). >>>>> >>>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <david.g...@28msec.com> >>>>> >>>> wrote: >>> >>>> Hello >>>>>> >>>>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my >>>>>> >>>>> system, >>>>> >>>>>> zookeeper is used to run a locking service and to generate unique >>>>>> id's. >>>>>> Currently, for testing purposes, I am only running one instance. Now, >>>>>> I >>>>>> >>>>> need >>>>> >>>>>> to set up an ensemble to protect my system against crashes. >>>>>> The ec2 services has some differences to a normal server farm. E.g. >>>>>> the >>>>>> data saved on the file system of an ec2 instance is lost if the >>>>>> >>>>> instance >>> >>>> crashes. In the documentation of zookeeper, I have read that zookeeper >>>>>> >>>>> saves >>>>> >>>>>> snapshots of the in-memory data in the file system. Is that needed for >>>>>> recovery? Logically, it would be much easier for me if this is not the >>>>>> >>>>> case. >>>>> >>>>>> Additionally, ec2 brings the advantage that serves can be switch on >>>>>> and >>>>>> >>>>> off >>>>> >>>>>> dynamically dependent on the load, traffic, etc. Can this advantage be >>>>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper >>>>>> >>>>> server >>>>> >>>>>> dynamically to an ensemble? E.g. dependent on the in-memory load? >>>>>> >>>>>> David >>>>>> >>>>>> >>> >>