Hi Neha,

I am almost sure the root of the problem is not on the client side. I ran tests 
with different Kafka client library implementations and got similar results. In 
my tests I "saturated" servers with load coming from 40 processes running on 4 
different hosts, so blocking producers even with global locks would not 
interfere with each other. Changing a single configuration parameter, a number 
of required acks, consistently reduced system throughput in all tests. And this 
drop of system throughput is too big to ignore.

Is there a global lock on the server side that controls flow of acks from 
"following" brokers?

I am reading Kafka sources at the moment but with my complete lack of knowledge 
of Scala it's not trivial.

Thank you,
Michael Popov


-----Original Message-----
From: Neha Narkhede [mailto:neha.narkh...@gmail.com] 
Sent: Wednesday, January 29, 2014 5:15 PM
To: users@kafka.apache.org
Subject: Re: Kafka performance test: "--request-num-acks -1" kills throughput

Michael,

The producer client in 0.8 has a single thread that does blocking data sends 
one broker at a time. However, we are working on a rewrite of the producer with 
an improved design that can support higher throughput compared to the current 
one. We don't have any performance numbers to share yet, but I think we can 
send something around as soon as the new producer client is somewhat stable. 
The new producer client will be released as part of 0.9.

Thanks,
Neha


On Wed, Jan 29, 2014 at 4:06 PM, Michael Popov <mpo...@microsoft.com> wrote:

> Hi,
>
> We need a reliable low-latency message queue that can scale. Kafka 
> looks like a right system for this role.
>
> I am running performance tests on multiple platforms: Linux and Windows.
> For test purposes I create topics with 2 replicas and multiple partitions.
> In all deployments running test producers that wait for both replicas' 
> acks practically kills Kafka throughput. For example, on the following 
> deployment on Linux machines: 2 Kafka brokers, 1 Zookeeper node, 4 
> client hosts to create load, 4 topics with 10 partitions each and 2 
> replicas
>
> -          running tests with "--request-num-acks 1" produces ~3,600
> msgs/sec
>
> -          running tests with "--request-num-acks -1" produces ~348
> msgs/sec
>
>
> Here is output of one of the four concurrent processes:
>
> [User@Client2 kafka_2.8.0-0.8.0]$  bin/kafka-producer-perf-test.sh 
> --broker-list 10.0.0.8:9092,10.0.0.10:9092 --compression-codec 0 
> --message-size 1024 --request-num-acks -1 --sync --messages 100000 
> -threads
> 10 --show-detailed-stats --reporting-interval 1000 --topics c12 | grep 
> -v "at "
> start.time, end.time, compression, message.size, batch.size, 
> total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
> [2014-01-29 23:21:16,720] WARN Property reconnect.interval is not 
> valid
> (kafka.utils.VerifiableProperties)
> [2014-01-29 23:21:16,825] WARN Property reconnect.interval is not 
> valid
> (kafka.utils.VerifiableProperties)
> [2014-01-29 23:21:16,830] WARN Property reconnect.interval is not 
> valid
> (kafka.utils.VerifiableProperties)
> [2014-01-29 23:21:16,831] WARN Property reconnect.interval is not 
> valid
> (kafka.utils.VerifiableProperties)
> [2014-01-29 23:21:16,839] WARN Property reconnect.interval is not 
> valid
> (kafka.utils.VerifiableProperties)
> [2014-01-29 23:21:16,841] WARN Property reconnect.interval is not 
> valid
> (kafka.utils.VerifiableProperties)
> [2014-01-29 23:21:16,847] WARN Property reconnect.interval is not 
> valid
> (kafka.utils.VerifiableProperties)
> [2014-01-29 23:21:16,858] WARN Property reconnect.interval is not 
> valid
> (kafka.utils.VerifiableProperties)
> [2014-01-29 23:21:16,862] WARN Property reconnect.interval is not 
> valid
> (kafka.utils.VerifiableProperties)
> [2014-01-29 23:21:16,867] WARN Property reconnect.interval is not 
> valid
> (kafka.utils.VerifiableProperties)
> [2014-01-29 23:32:03,830] WARN Produce request with correlation id 
> 11467 failed due to [c12,2]: kafka.common.RequestTimedOutException
> (kafka.producer.async.DefaultEventHandler)
> [2014-01-29 23:32:03,831] WARN Produce request with correlation id 
> 11859 failed due to [c12,8]: kafka.common.RequestTimedOutException
> (kafka.producer.async.DefaultEventHandler)
> [2014-01-29 23:32:03,831] WARN Failed to send producer request with 
> correlation id 11819 to broker 0 with data for partitions [c12,8]
> (kafka.producer.async.DefaultEventHandler) 
> java.net.SocketTimeoutException
> [2014-01-29 23:32:03,834] WARN Failed to send producer request with 
> correlation id 11315 to broker 0 with data for partitions [c12,6]
> (kafka.producer.async.DefaultEventHandler) 
> java.net.SocketTimeoutException
> [2014-01-29 23:32:03,834] WARN Failed to send producer request with 
> correlation id 11191 to broker 0 with data for partitions [c12,4]
> (kafka.producer.async.DefaultEventHandler) 
> java.net.SocketTimeoutException
> [2014-01-29 23:32:03,834] WARN Failed to send producer request with 
> correlation id 11791 to broker 0 with data for partitions [c12,4]
> (kafka.producer.async.DefaultEventHandler) 
> java.net.SocketTimeoutException
> [2014-01-29 23:32:03,834] WARN Failed to send producer request with 
> correlation id 11395 to broker 0 with data for partitions [c12,6]
> (kafka.producer.async.DefaultEventHandler) 
> java.net.SocketTimeoutException
> [2014-01-29 23:32:03,834] WARN Failed to send producer request with 
> correlation id 11631 to broker 0 with data for partitions [c12,4]
> (kafka.producer.async.DefaultEventHandler) 
> java.net.SocketTimeoutException
> [2014-01-29 23:32:03,834] WARN Failed to send producer request with 
> correlation id 10563 to broker 0 with data for partitions [c12,0]
> (kafka.producer.async.DefaultEventHandler) 
> java.net.SocketTimeoutException
> [2014-01-29 23:32:03,834] WARN Failed to send producer request with 
> correlation id 10907 to broker 0 with data for partitions [c12,2]
> (kafka.producer.async.DefaultEventHandler) 
> java.net.SocketTimeoutException
> 2014-01-29 23:21:16:562, 2014-01-29 23:40:15:886, 0, 1024, 200, 97.66, 
> 0.0857, 100000, 87.7713
>
> The test result is consistent and reproducible in all deployments: 
> numbers can vary but changing acks setting consistently reduces Kafka 
> throughput
> 4-10 times.
>
> Is it expected system behavior? Any tuning options to resolve the problem?
>
> Thank you,
> Michael Popov
>
>

Reply via email to