Good point Jun Rao. We've been trying to get things to scale with
normal mode first and haven't tried failure scenarios yet.

Thanks for the pointer to KIP-227! It looks promising indeed. I'm
working on sprucing up my reproduction environment and tests and
hopefully will have more info to share soon.

On Wed, Jan 3, 2018 at 4:47 PM, Jun Rao <j...@confluent.io> wrote:
> Hi, Andrey,
>
> If the test is in the normal mode, it would be useful to figure out why ZK
> is the bottleneck since the normal mode typically doesn't require ZK
> accesses.
>
> Thanks,
>
> Jun
>
> On Wed, Jan 3, 2018 at 3:00 PM, Andrey Falko <afa...@salesforce.com> wrote:
>
>> Ben Wood:
>> 1. We have 5 ZK nodes.
>> 2. I only tracked outstanding requests thus far from ZK-side of
>> things. At 9.5k topics, I recorded about 5k outstanding requests. I'll
>> start tracking this better for my next run. Anything else worth
>> tracking?
>>
>> Jun Rao:
>> I'm testing the latest 1.0.0. I'm testing normal mode. I don't take
>> anything down. I had a run where I tried to scale the brokers up, but
>> it didn't improve things. Thanks for pointing me at the KIP!
>>
>> On Wed, Jan 3, 2018 at 2:50 PM, Jun Rao <j...@confluent.io> wrote:
>> > Hi, Andrey,
>> >
>> > Thanks for reporting the results. Which version of Kafka are you testing?
>> > Also, it would be useful to know if you are testing the normal mode when
>> > all replicas are up and in sync, or the failure mode when some of the
>> > replicas are being restarted. Typically, ZK is only accessed in the
>> failure
>> > mode.
>> >
>> > We have made some significant improvement in the failure mode by reducing
>> > the logging overhead (KAFKA-6116) and making the ZK accesses async
>> > (KAFKA-5642). These won't necessarily reduce the number of requests to
>> ZK,
>> > but will allow better pipelining when accessing ZK.
>> >
>> > In the normal mode, we are now discussing KIP-227, which could reduce the
>> > overhead for replication and consumption when there are many partitions.
>> >
>> > Jun
>> >
>> > On Wed, Jan 3, 2018 at 1:48 PM, Andrey Falko <afa...@salesforce.com>
>> wrote:
>> >
>> >> Hi everyone,
>> >>
>> >> We are seeing more and more push from our Kafka users to support well
>> >> more than 10k replicated partitions. We'd ideally like to avoid running
>> >> multiple
>> >> clusters to keep our cluster management and monitoring simple. We
>> started
>> >> testing kafka to see how many replicated partitions it could handle.
>> >>
>> >> We found that, to maintain SLAs of under 50ms for produce latency,
>> >> Kafka starts going downhill at around 9k topics with 5 brokers. Each
>> topic
>> >> is
>> >> replicated 3x in our test. The bottleneck appears to be zookeeper:
>> >> after a certain
>> >> period of time, the number of outstanding requests in ZK spikes up at a
>> >> linear rate. Slowing down the rate at which we create and produce to
>> >> topics,
>> >> improves things, but doing that makes the system tougher to manage and
>> use.
>> >> We are happy to publish our detailed results with reproduction
>> >> steps if anyone is interested.
>> >>
>> >> Has anyone overcome this problem and scaled beyond 9k replicated
>> >> partitions?
>> >> Does anyone have zookeeper tuning suggestions? Is it even the
>> bottleneck?
>> >>
>> >> According to this we should have at most 300 3x replicated per broker:
>> >> https://www.confluent.io/blog/how-to-choose-the-number-of-
>> >> topicspartitions-in-a-kafka-cluster/
>> >> Is anyone doing work to have kafka support more than that?
>> >>
>> >> Best regards,
>> >> Andrey Falko
>> >> Salesforce.com
>> >>
>>

Reply via email to