RE: Apache Kafka in AWS

Jason Weiss Wed, 22 May 2013 16:24:40 -0700

Jonathan,

Using 0.7.2, with just a single EBS volume per broker instance - negative on 
the RAID 10.


I would speculate that if we used RAID 10 and we went with AWS's maximum 
provisioned IOPS (5000??) we probably could have squeaked out some more eps.

I have no doubt, BTW, that if we would have implemented this on bare metal, the 
numbers would have been substantially higher. For example, the variation 
between the 20 different "identical" producer clients was rather dramatic - as 
much as 5000 eps in some cases. For being identical virtualized devices, 
running identical software, configured identically from a singular AWS AMI - 
the only explanation is that the performance difference was from the "tax" of 
using virtualized devices.


Jason


________________________________________
From: Jonathan Hodges [hodg...@gmail.com]
Sent: Wednesday, May 22, 2013 19:11
To: users@kafka.apache.org
Subject: Re: Apache Kafka in AWS

Awesome right up Jason!  Very helpful as we are also looking to build a
Kafka environment in AWS.  I am curious, are you using Kafka 0.7.2 or 0.8
in your tests?  Did you have just one EBS volume per broker instance or
RAID 10 across EBS volumes per broker?

Thanks again for the great info!

-Jonathan


On Wed, May 22, 2013 at 4:35 PM, Jason Weiss <jason_we...@rapid7.com> wrote:

> Ken,
>
> Great question! I should have indicated I was using EBS, 500GB with 2000
> provisioned IOPs.
>
> Jason
>
> ________________________________________
> From: Ken Krugler [kkrugler_li...@transpac.com]
> Sent: Wednesday, May 22, 2013 17:23
> To: users@kafka.apache.org
> Subject: Re: Apache Kafka in AWS
>
> Hi Jason,
>
> Thanks for the notes.
>
> I'm curious whether you went with using local drives (ephemeral storage)
> or EBS, and if with EBS then what IOPS.
>
> Thanks,
>
> -- Ken
>
> On May 22, 2013, at 1:42pm, Jason Weiss wrote:
>
> > All,
> >
> > I asked a number of questions of the group over the last week, and I'm
> happy to report that I've had great success getting Kafka up and running in
> AWS. I am using 3 EC2 instances, each of which is a M2 High-Memory
> Quadruple Extra Large with 8 cores and 58.4 GiB of memory according to the
> AWS specs. I have co-located Zookeeper instances next to Zafka on each
> machine.
> >
> > I am able to publish in a repeatable fashion 273,000 events per second,
> with each event payload consisting of a fixed size of 2048 bytes! This
> represents the maximum throughput possible on this configuration, as the
> servers became CPU constrained, averaging 97% utilization in a relatively
> flat line. This isn't a "burst" speed – it represents a sustained
> throughput from 20 M1 Large EC2 Kafka multi-threaded producers. Putting
> this into perspective, if my log retention period was a month, I'd be
> aggregating 1.3 petabytes of data on my disk drives. Suffice to say, I
> don't see us retaining data for more than a few hours!
> >
> > Here were the keys to tuning for future folks to consider:
> >
> > First and foremost, be sure to configure your Java heap size accordingly
> when you launch Kafka. The default is like 512MB, which in my case left
> virtually all of my RAM inaccessible to Kafka.
> > Second, stay away from OpenJDK. No, seriously – this was a huge thorn in
> my side, and I almost gave up on Kafka because of the problems I
> encountered. The OpenJDK NIO functions repeatedly resulted in Kafka
> crashing and burning in dramatic fashion. The moment I switched over to
> Oracle's JDK for linux, Kafka didn't puke once- I mean, like not even a
> hiccup.
> > Third know your message size. In my opinion, the more you understand
> about your event payload characteristics, the better you can tune the
> system. The two knobs to really turn are the log.flush.interval and
> log.default.flush.interval.ms. The values here are intrinsically
> connected to the types of payloads you are putting through the system.
> > Fourth and finally, to maximize throughput you have to code against the
> async paradigm, and be prepared to tweak the batch size, queue properties,
> and compression codec (wait for it…) in a way that matches the message
> payload you are putting through the system and the capabilities of the
> producer system itself.
> >
> >
> > Jason
> >
> >
> >
> >
> >
> > This electronic message contains information which may be confidential
> or privileged. The information is intended for the use of the individual or
> entity named above. If you are not the intended recipient, be aware that
> any disclosure, copying, distribution or use of the contents of this
> information is prohibited. If you have received this electronic
> transmission in error, please notify us by e-mail at (
> postmas...@rapid7.com) immediately.
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
> This electronic message contains information which may be confidential or
> privileged. The information is intended for the use of the individual or
> entity named above. If you are not the intended recipient, be aware that
> any disclosure, copying, distribution or use of the contents of this
> information is prohibited. If you have received this electronic
> transmission in error, please notify us by e-mail at (
> postmas...@rapid7.com) immediately.
>
>
This electronic message contains information which may be confidential or 
privileged. The information is intended for the use of the individual or entity 
named above. If you are not the intended recipient, be aware that any 
disclosure, copying, distribution or use of the contents of this information is 
prohibited. If you have received this electronic transmission in error, please 
notify us by e-mail at (postmas...@rapid7.com) immediately.

RE: Apache Kafka in AWS

Reply via email to