Hi satish, what GC are you using? Is it ConcurrentMarkSweep or Parallel/Serial?
Also, how is your disk usage on this machine? Can you check your iostat numbers? Thanks mahadev On 9/1/09 5:15 PM, "Satish Bhatti" <cthd2...@gmail.com> wrote: > GC Time: 11.628 seconds on PS MarkSweep (389 collections)5 minutes on PS > scavenge( 7,636 collections) > > It's been running for about 48 hours. > > > On Tue, Sep 1, 2009 at 5:12 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > >> Do you have long GC delays? >> >> On Tue, Sep 1, 2009 at 4:51 PM, Satish Bhatti <cthd2...@gmail.com> wrote: >> >>> Session timeout is 30 seconds. >>> >>> On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <ph...@apache.org> wrote: >>> >>>> What is your client timeout? It may be too low. >>>> >>>> also see this section on handling recoverable errors: >>>> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling >>>> >>>> connection loss in particular needs special care since: >>>> "When a ZooKeeper client loses a connection to the ZooKeeper server >> there >>>> may be some requests in flight; we don't know where they were in their >>>> flight at the time of the connection loss. " >>>> >>>> Patrick >>>> >>>> >>>> Satish Bhatti wrote: >>>> >>>>> I have recently started running on EC2 and am seeing quite a few >>>>> ConnectionLoss exceptions. Should I just catch these and retry? >> Since >>> I >>>>> assume that eventually, if the shit truly hits the fan, I will get a >>>>> SessionExpired? >>>>> Satish >>>>> >>>>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <ted.dunn...@gmail.com> >>>>> wrote: >>>>> >>>>> We have used EC2 quite a bit for ZK. >>>>>> >>>>>> The basic lessons that I have learned include: >>>>>> >>>>>> a) EC2's biggest advantage after scaling and elasticity was >> conformity >>> of >>>>>> configuration. Since you are bringing machines up and down all the >>> time, >>>>>> they begin to act more like programs and you wind up with boot >> scripts >>>>>> that >>>>>> give you a very predictable environment. Nice. >>>>>> >>>>>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN. >>>>>> That >>>>>> can make the ZK servers appear a bit less connected. You have to >> plan >>>>>> for >>>>>> ConnectionLoss events. >>>>>> >>>>>> c) for highest reliability, I switched to large instances. On >>>>>> reflection, >>>>>> I >>>>>> think that was helpful, but less important than I thought at the >> time. >>>>>> >>>>>> d) increasing and decreasing cluster size is nearly painless and is >>>>>> easily >>>>>> scriptable. To decrease, do a rolling update on the survivors to >>> update >>>>>> their configuration. Then take down the instance you want to lose. >> To >>>>>> increase, do a rolling update starting with the new instances to >> update >>>>>> the >>>>>> configuration to include all of the machines. The rolling update >>> should >>>>>> bounce each ZK with several seconds between each bounce. Rescaling >> the >>>>>> cluster takes less than a minute which makes it comparable to EC2 >>>>>> instance >>>>>> boot time (about 30 seconds for the Alestic ubuntu instance that we >>> used >>>>>> plus about 20 seconds for additional configuration). >>>>>> >>>>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <david.g...@28msec.com> >>>>>> wrote: >>>>>> >>>>>> Hello >>>>>>> >>>>>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my >>>>>>> >>>>>> system, >>>>>> >>>>>>> zookeeper is used to run a locking service and to generate unique >>> id's. >>>>>>> Currently, for testing purposes, I am only running one instance. >> Now, >>> I >>>>>>> >>>>>> need >>>>>> >>>>>>> to set up an ensemble to protect my system against crashes. >>>>>>> The ec2 services has some differences to a normal server farm. E.g. >>> the >>>>>>> data saved on the file system of an ec2 instance is lost if the >>> instance >>>>>>> crashes. In the documentation of zookeeper, I have read that >> zookeeper >>>>>>> >>>>>> saves >>>>>> >>>>>>> snapshots of the in-memory data in the file system. Is that needed >> for >>>>>>> recovery? Logically, it would be much easier for me if this is not >> the >>>>>>> >>>>>> case. >>>>>> >>>>>>> Additionally, ec2 brings the advantage that serves can be switch on >>> and >>>>>>> >>>>>> off >>>>>> >>>>>>> dynamically dependent on the load, traffic, etc. Can this advantage >> be >>>>>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper >>>>>>> >>>>>> server >>>>>> >>>>>>> dynamically to an ensemble? E.g. dependent on the in-memory load? >>>>>>> >>>>>>> David >>>>>>> >>>>>>> >>>>> >>> >> >> >> >> -- >> Ted Dunning, CTO >> DeepDyve >>