Sorry go back this far in time, I just noticed that the list had replied accusing this email being spam, so I'll try again with better formatting...
A few questions, hopefully you (and everyone) don't mind. Feel free to ignore any/all.. I am trying to learn what I can from people who are considerably larger than we are, so we don't have the same pains (hopefully) * Are all 1100 brokers hardware? * Is there any Hardware or OS tuning you've found beneficial? * How do you manage deploying config updates? In particular, how do you manage the broker restarts‎ to pickup changes? * Why 60 clusters? What segmentation of purpose (aside from the 2 layers detailed in this doc) do you have? * Do you tune the clusters for different workloads/data types? * What challenges have you faced running that many clusters and nodes vs when you were smaller? * How do you manage keeping topics named nicely between clusters? (not conflicting) . * How do you manage partitioning and balancing (and rebalancing when a topic/partition start growing very quickly)? * Have you/how have you enabled your users/customers to monitor their data flow, or do they just trust you to let them know if there are issues? Thanks very much, sorry for the question dump! On Mon, Mar 23, 2015 at 9:42 AM, Todd Palino <tpal...@gmail.com> wrote: > Emmanuel, if it helps, here's a little more detail on the hardware spec we > are using at the moment: > > 12 CPU (HT enabled) > 64 GB RAM > 16 x 1TB SAS drives (2 are used as a RAID-1 set for the OS, 14 are a > RAID-10 set just for the Kafka log segments) > > We don't colocate any other applications with Kafka except for a couple > monitoring agents. Zookeeper runs on completely separate nodes. > > I suggest starting with looking at the basics - watch the CPU, memory, and > disk IO usage on the brokers as you are testing. You're likely going to > find one of these three is the constraint. Disk IO in particular can lead > to a significant increase in produce latency as it increases even over > 10-15% utilization. > > -Todd > > > On Fri, Mar 20, 2015 at 3:41 PM, Emmanuel <ele...@msn.com> wrote: > >> This is why I'm confused because I'm tryign to benchmark and I see numbers >> that seem pretty low to me...8000 events/sec on 2 brokers with 3CPU each >> and 5 partitions should be way faster than this and I don't know where to >> start to debug... >> the kafka-consumer-perf-test script gives me ridiculously low numbers >> (1000 events/sec/thread) >> >> So what could be causing this? >> From: jbringhu...@linkedin.com.INVALID >> To: users@kafka.apache.org >> Subject: Re: Post on running Kafka at LinkedIn >> Date: Fri, 20 Mar 2015 22:16:29 +0000 >> >> Keep in mind that these brokers aren't really stressed too much at any >> given time -- we need to stay ahead of the capacity curve. >> Your message throughput will really just depend on what hardware you're >> using. However, in the past, we've benchmarked at 400,000 to more than >> 800,000 messages / broker / sec, depending on configuration ( >> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines >> ). >> >> -Jon >> On Mar 20, 2015, at 3:03 PM, Emmanuel <ele...@msn.com> wrote:800B >> messages / day = 9.26M messages / sec over 1100 brokers >> = ~8400 message / broker / sec >> Do I get this right? >> Trying to benchmark my own test cluster and that's what I see with 2 >> brokers...Just wondering if my numbers are good or bad... >> >> >> Subject: Re: Post on running Kafka at LinkedIn >> From: cl...@kafka.guru >> Date: Fri, 20 Mar 2015 14:27:58 -0700 >> To: users@kafka.apache.org >> >> Yep! We are growing :) >> >> -Clark >> >> Sent from my iPhone >> >> On Mar 20, 2015, at 2:14 PM, James Cheng <jch...@tivo.com> wrote: >> >> Amazing growth numbers. >> >> At the meetup on 1/27, Clark Haskins presented their Kafka usage at the >> time. It was: >> >> Bytes in: 120 TB >> Messages In: 585 million >> Bytes out: 540 TB >> Total brokers: 704 >> >> In Todd's post, the current numbers: >> >> Bytes in: 175 TB (45% growth) >> Messages In: 800 billion (36% growth) >> Bytes out: 650 TB (20% growth) >> Total brokers: 1100 (56% growth) >> >> That much growth in just 2 months? Wowzers. >> >> -James >> >> On Mar 20, 2015, at 11:30 AM, James Cheng <jch...@tivo.com> wrote: >> >> For those who missed it: >> >> The Kafka Audit tool was also presented at the 1/27 Kafka meetup: >> http://www.meetup.com/http-kafka-apache-org/events/219626780/ >> >> Recorded video is here, starting around the 40 minute mark: >> http://www.ustream.tv/recorded/58109076 >> >> Slides are here: >> http://www.ustream.tv/recorded/58109076 >> >> -James >> >> On Mar 20, 2015, at 9:47 AM, Todd Palino <tpal...@gmail.com> wrote: >> >> For those who are interested in detail on how we've got Kafka set up at >> LinkedIn, I have just published a new posted to our Engineering blog titled >> "Running Kafka at Scale" >> >> https://engineering.linkedin.com/kafka/running-kafka-scale >> >> It's a general overview of our current Kafka install, tiered architecture, >> audit, and the libraries we use for producers and consumers. You'll also be >> seeing more posts from the SRE team here in the coming weeks on deeper >> looks into both Kafka and Samza. >> >> Additionally, I'll be giving a talk at ApacheCon next month on running >> tiered Kafka architectures. If you're in Austin for that, please come by >> and check it out. >> >> -Todd >> >> >> >>