1. How many ZK nodes in your ensemble? 2. Do you have metrics on how many requests ZK is handling?
On Wed, Jan 3, 2018 at 1:48 PM, Andrey Falko <afa...@salesforce.com> wrote: > Hi everyone, > > We are seeing more and more push from our Kafka users to support well > more than 10k replicated partitions. We'd ideally like to avoid running > multiple > clusters to keep our cluster management and monitoring simple. We started > testing kafka to see how many replicated partitions it could handle. > > We found that, to maintain SLAs of under 50ms for produce latency, > Kafka starts going downhill at around 9k topics with 5 brokers. Each topic > is > replicated 3x in our test. The bottleneck appears to be zookeeper: > after a certain > period of time, the number of outstanding requests in ZK spikes up at a > linear rate. Slowing down the rate at which we create and produce to > topics, > improves things, but doing that makes the system tougher to manage and use. > We are happy to publish our detailed results with reproduction > steps if anyone is interested. > > Has anyone overcome this problem and scaled beyond 9k replicated > partitions? > Does anyone have zookeeper tuning suggestions? Is it even the bottleneck? > > According to this we should have at most 300 3x replicated per broker: > https://www.confluent.io/blog/how-to-choose-the-number-of- > topicspartitions-in-a-kafka-cluster/ > Is anyone doing work to have kafka support more than that? > > Best regards, > Andrey Falko > Salesforce.com > -- Ben Wood Software Engineer - Data Agility Mesosphere