Re: zookeeper on ec2

Satish Bhatti Tue, 01 Sep 2009 17:54:52 -0700

Parallel/Serial.
inf...@domu-12-31-39-06-3d-d1:/opt/ir/agent/infact-installs/aaa/infact$
iostat
Linux 2.6.18-xenU-ec2-v1.0 (domU-12-31-39-06-3D-D1)     09/01/2009
 _x86_64_


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          66.11    0.00    1.54    2.96   20.30    9.08

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2            460.83       410.02     12458.18   40499322 1230554928
sdc               0.00         0.00         0.00         96          0
sda1              0.53         5.01         4.89     495338     482592



On Tue, Sep 1, 2009 at 5:46 PM, Mahadev Konar <maha...@yahoo-inc.com> wrote:

> Hi satish,
>  what GC are you using? Is it ConcurrentMarkSweep or Parallel/Serial?
>
>  Also, how is your disk usage on this machine? Can you check your iostat
> numbers?
>
> Thanks
> mahadev
>
>
> On 9/1/09 5:15 PM, "Satish Bhatti" <cthd2...@gmail.com> wrote:
>
> > GC Time: 11.628 seconds on PS MarkSweep (389 collections)5 minutes on PS
> > scavenge( 7,636 collections)
> >
> > It's been running for about 48 hours.
> >
> >
> > On Tue, Sep 1, 2009 at 5:12 PM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
> >
> >> Do you have long GC delays?
> >>
> >> On Tue, Sep 1, 2009 at 4:51 PM, Satish Bhatti <cthd2...@gmail.com>
> wrote:
> >>
> >>> Session timeout is 30 seconds.
> >>>
> >>> On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <ph...@apache.org> wrote:
> >>>
> >>>> What is your client timeout? It may be too low.
> >>>>
> >>>> also see this section on handling recoverable errors:
> >>>> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
> >>>>
> >>>> connection loss in particular needs special care since:
> >>>> "When a ZooKeeper client loses a connection to the ZooKeeper server
> >> there
> >>>> may be some requests in flight; we don't know where they were in their
> >>>> flight at the time of the connection loss. "
> >>>>
> >>>> Patrick
> >>>>
> >>>>
> >>>> Satish Bhatti wrote:
> >>>>
> >>>>> I have recently started running on EC2 and am seeing quite a few
> >>>>> ConnectionLoss exceptions.  Should I just catch these and retry?
> >>  Since
> >>> I
> >>>>> assume that eventually, if the shit truly hits the fan, I will get a
> >>>>> SessionExpired?
> >>>>> Satish
> >>>>>
> >>>>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <ted.dunn...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>  We have used EC2 quite a bit for ZK.
> >>>>>>
> >>>>>> The basic lessons that I have learned include:
> >>>>>>
> >>>>>> a) EC2's biggest advantage after scaling and elasticity was
> >> conformity
> >>> of
> >>>>>> configuration.  Since you are bringing machines up and down all the
> >>> time,
> >>>>>> they begin to act more like programs and you wind up with boot
> >> scripts
> >>>>>> that
> >>>>>> give you a very predictable environment.  Nice.
> >>>>>>
> >>>>>> b) EC2 interconnect has a lot more going on than in a dedicated
> VLAN.
> >>>>>>  That
> >>>>>> can make the ZK servers appear a bit less connected.  You have to
> >> plan
> >>>>>> for
> >>>>>> ConnectionLoss events.
> >>>>>>
> >>>>>> c) for highest reliability, I switched to large instances.  On
> >>>>>> reflection,
> >>>>>> I
> >>>>>> think that was helpful, but less important than I thought at the
> >> time.
> >>>>>>
> >>>>>> d) increasing and decreasing cluster size is nearly painless and is
> >>>>>> easily
> >>>>>> scriptable.  To decrease, do a rolling update on the survivors to
> >>> update
> >>>>>> their configuration.  Then take down the instance you want to lose.
> >>  To
> >>>>>> increase, do a rolling update starting with the new instances to
> >> update
> >>>>>> the
> >>>>>> configuration to include all of the machines.  The rolling update
> >>> should
> >>>>>> bounce each ZK with several seconds between each bounce.  Rescaling
> >> the
> >>>>>> cluster takes less than a minute which makes it comparable to EC2
> >>>>>> instance
> >>>>>> boot time (about 30 seconds for the Alestic ubuntu instance that we
> >>> used
> >>>>>> plus about 20 seconds for additional configuration).
> >>>>>>
> >>>>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <david.g...@28msec.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>  Hello
> >>>>>>>
> >>>>>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
> >>>>>>>
> >>>>>> system,
> >>>>>>
> >>>>>>> zookeeper is used to run a locking service and to generate unique
> >>> id's.
> >>>>>>> Currently, for testing purposes, I am only running one instance.
> >> Now,
> >>> I
> >>>>>>>
> >>>>>> need
> >>>>>>
> >>>>>>> to set up an ensemble to protect my system against crashes.
> >>>>>>> The ec2 services has some differences to a normal server farm. E.g.
> >>> the
> >>>>>>> data saved on the file system of an ec2 instance is lost if the
> >>> instance
> >>>>>>> crashes. In the documentation of zookeeper, I have read that
> >> zookeeper
> >>>>>>>
> >>>>>> saves
> >>>>>>
> >>>>>>> snapshots of the in-memory data in the file system. Is that needed
> >> for
> >>>>>>> recovery? Logically, it would be much easier for me if this is not
> >> the
> >>>>>>>
> >>>>>> case.
> >>>>>>
> >>>>>>> Additionally, ec2 brings the advantage that serves can be switch on
> >>> and
> >>>>>>>
> >>>>>> off
> >>>>>>
> >>>>>>> dynamically dependent on the load, traffic, etc. Can this advantage
> >> be
> >>>>>>> utilized for a zookeeper ensemble? Is it possible to add a
> zookeeper
> >>>>>>>
> >>>>>> server
> >>>>>>
> >>>>>>> dynamically to an ensemble? E.g. dependent on the in-memory load?
> >>>>>>>
> >>>>>>> David
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Ted Dunning, CTO
> >> DeepDyve
> >>
>
>

Re: zookeeper on ec2

Reply via email to