Hi Jason,

Thanks for the notes.

I'm curious whether you went with using local drives (ephemeral storage) or 
EBS, and if with EBS then what IOPS.

Thanks,

-- Ken

On May 22, 2013, at 1:42pm, Jason Weiss wrote:

> All,
> 
> I asked a number of questions of the group over the last week, and I'm happy 
> to report that I've had great success getting Kafka up and running in AWS. I 
> am using 3 EC2 instances, each of which is a M2 High-Memory Quadruple Extra 
> Large with 8 cores and 58.4 GiB of memory according to the AWS specs. I have 
> co-located Zookeeper instances next to Zafka on each machine.
> 
> I am able to publish in a repeatable fashion 273,000 events per second, with 
> each event payload consisting of a fixed size of 2048 bytes! This represents 
> the maximum throughput possible on this configuration, as the servers became 
> CPU constrained, averaging 97% utilization in a relatively flat line. This 
> isn't a "burst" speed – it represents a sustained throughput from 20 M1 Large 
> EC2 Kafka multi-threaded producers. Putting this into perspective, if my log 
> retention period was a month, I'd be aggregating 1.3 petabytes of data on my 
> disk drives. Suffice to say, I don't see us retaining data for more than a 
> few hours!
> 
> Here were the keys to tuning for future folks to consider:
> 
> First and foremost, be sure to configure your Java heap size accordingly when 
> you launch Kafka. The default is like 512MB, which in my case left virtually 
> all of my RAM inaccessible to Kafka.
> Second, stay away from OpenJDK. No, seriously – this was a huge thorn in my 
> side, and I almost gave up on Kafka because of the problems I encountered. 
> The OpenJDK NIO functions repeatedly resulted in Kafka crashing and burning 
> in dramatic fashion. The moment I switched over to Oracle's JDK for linux, 
> Kafka didn't puke once- I mean, like not even a hiccup.
> Third know your message size. In my opinion, the more you understand about 
> your event payload characteristics, the better you can tune the system. The 
> two knobs to really turn are the log.flush.interval and 
> log.default.flush.interval.ms. The values here are intrinsically connected to 
> the types of payloads you are putting through the system.
> Fourth and finally, to maximize throughput you have to code against the async 
> paradigm, and be prepared to tweak the batch size, queue properties, and 
> compression codec (wait for it…) in a way that matches the message payload 
> you are putting through the system and the capabilities of the producer 
> system itself.
> 
> 
> Jason
> 
> 
> 
> 
> 
> This electronic message contains information which may be confidential or 
> privileged. The information is intended for the use of the individual or 
> entity named above. If you are not the intended recipient, be aware that any 
> disclosure, copying, distribution or use of the contents of this information 
> is prohibited. If you have received this electronic transmission in error, 
> please notify us by e-mail at (postmas...@rapid7.com) immediately.

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Reply via email to