Michael,

Those SocketTimeoutExceptions meas that the producer didn't receive the
response from the broker in time. Could you check the request log in the
broker and see what the request completion time is and how it compares with
the request socket timeout?

I did some testing a while back. Latency wise, ack=-1 is about twice that
of ack=1 since it requires an extra network round trip.

Thanks,

Jun


On Thu, Jan 30, 2014 at 11:23 AM, Michael Popov <mpo...@microsoft.com>wrote:

> Hi Jun,
>
> I ran a new new tests today.
>
> Settings: 2 Kafka brokers, 1 Zookeeper node, 4 client machines.
> Creating a new topic with 1 partition for each client process.
> Commands to create a topic looked like this one:
>
>    bin/kafka-create-topic.sh --partition 1 --replica 2 --zookeeper
> 10.0.0.4:2181 --topic d1
>
> Commands to run a test producer looked like this:
>
>     bin/kafka-producer-perf-test.sh --broker-list 10.0.0.8:9092,
> 10.0.0.10:9092 --compression-codec 0 --message-size 1024
> --request-num-acks 1 --sync --messages 100000 -threads 10
> --show-detailed-stats --reporting-interval 1000 --topics d1 | grep -v "at "
>
> Results of 4 test runs:
> ======================================
> --request-num-acks  -1
> =====================================
> 2014-01-30 17:55:01:576, 2014-01-30 17:59:23:294, 0, 1024, 200, 97.66,
> 0.3731, 100000, 382.0906
> 2014-01-30 17:55:01:921, 2014-01-30 18:20:56:952, 0, 1024, 200, 97.66,
> 0.0628, 99999, 64.3068
> 2014-01-30 17:55:00:037, 2014-01-30 18:20:57:164, 0, 1024, 200, 97.66,
> 0.0627, 100000, 64.2208
> 2014-01-30 17:54:59:723, 2014-01-30 18:20:34:633, 0, 1024, 200, 97.66,
> 0.0636, 100000, 65.1504
>
> ======================================
> --request-num-acks  1
> =====================================
> 2014-01-30 18:24:54:588, 2014-01-30 18:27:42:408, 0, 1024, 200, 97.66,
> 0.5819, 100000, 595.8765
> 2014-01-30 18:24:54:830, 2014-01-30 18:27:24:080, 0, 1024, 200, 97.66,
> 0.6543, 100000, 670.0168
> 2014-01-30 18:24:52:855, 2014-01-30 18:25:48:119, 0, 1024, 200, 97.66,
> 1.7671, 100000, 1809.4962
> 2014-01-30 18:24:52:763, 2014-01-30 18:27:45:675, 0, 1024, 200, 97.66,
> 0.5648, 100000, 578.3289
>
> ======================================
> --request-num-acks  -1
> =====================================
> 2014-01-30 18:31:08:756, 2014-01-30 18:57:12:123, 0, 1024, 200, 97.66,
> 0.0625, 99998, 63.9632
> 2014-01-30 18:31:08:860, 2014-01-30 18:37:18:634, 0, 1024, 200, 97.66,
> 0.2641, 100000, 270.4355
> 2014-01-30 18:31:06:968, 2014-01-30 18:57:28:389, 0, 1024, 200, 97.66,
> 0.0618, 99999, 63.2336
> 2014-01-30 18:31:06:819, 2014-01-30 18:37:19:703, 0, 1024, 200, 97.66,
> 0.2619, 100000, 268.1799
>
> ======================================
> --request-num-acks  1
> =====================================
> 2014-01-30 19:02:35:171, 2014-01-30 19:05:21:650, 0, 1024, 200, 97.66,
> 0.5866, 100000, 600.6764
> 2014-01-30 19:02:40:264, 2014-01-30 19:05:01:863, 0, 1024, 200, 97.66,
> 0.6897, 100000, 706.2197
> 2014-01-30 19:02:38:537, 2014-01-30 19:03:27:687, 0, 1024, 200, 97.66,
> 1.9869, 100000, 2034.5880
> 2014-01-30 19:02:33:384, 2014-01-30 19:04:51:134, 0, 1024, 200, 97.66,
> 0.7089, 100000, 725.9528
>
>
> There is something seriously wrong with Kafka throughput in "2acks" tests.
> Anything I can do to resolve this issue?
>
> One more thing. Producers kept printing error messages like this during
> the test runs:
> [2014-01-30 18:32:39,777] WARN Failed to send producer request with
> correlation id 51676 to broker 1 with data for partitions [d4111,0]
> (kafka.producer.async.DefaultEventHandler)
> java.net.SocketTimeoutException
> [2014-01-30 18:32:53,308] WARN Produce request with correlation id 57999
> failed due to [d4111,0]: kafka.common.RequestTimedOutException
> (kafka.producer.async.DefaultEventHandler)
> [2014-01-30 18:33:22,350] WARN Failed to send producer request with
> correlation id 72096 to broker 1 with data for partitions [d4111,0]
> (kafka.producer.async.DefaultEventHandler)
> java.net.SocketTimeoutException
> [2014-01-30 18:33:39,740] WARN Failed to send producer request with
> correlation id 81147 to broker 1 with data for partitions [d4111,0]
> (kafka.producer.async.DefaultEventHandler)
> java.net.SocketTimeoutException
>
> Thank you,
> Michael Popov
>
>
> -----Original Message-----
> From: Jun Rao [mailto:jun...@gmail.com]
> Sent: Wednesday, January 29, 2014 9:01 PM
> To: users@kafka.apache.org
> Subject: Re: Kafka performance test: "--request-num-acks -1" kills
> throughput
>
> Does the result change with just 1 partition?
>
> Thanks,
>
> Jun
>
>
> On Wed, Jan 29, 2014 at 4:06 PM, Michael Popov <mpo...@microsoft.com>
> wrote:
>
> > Hi,
> >
> > We need a reliable low-latency message queue that can scale. Kafka
> > looks like a right system for this role.
> >
> > I am running performance tests on multiple platforms: Linux and Windows.
> > For test purposes I create topics with 2 replicas and multiple
> partitions.
> > In all deployments running test producers that wait for both replicas'
> > acks practically kills Kafka throughput. For example, on the following
> > deployment on Linux machines: 2 Kafka brokers, 1 Zookeeper node, 4
> > client hosts to create load, 4 topics with 10 partitions each and 2
> > replicas
> >
> > -          running tests with "--request-num-acks 1" produces ~3,600
> > msgs/sec
> >
> > -          running tests with "--request-num-acks -1" produces ~348
> > msgs/sec
> >
> >
> > Here is output of one of the four concurrent processes:
> >
> > [User@Client2 kafka_2.8.0-0.8.0]$  bin/kafka-producer-perf-test.sh
> > --broker-list 10.0.0.8:9092,10.0.0.10:9092 --compression-codec 0
> > --message-size 1024 --request-num-acks -1 --sync --messages 100000
> > -threads
> > 10 --show-detailed-stats --reporting-interval 1000 --topics c12 | grep
> > -v "at "
> > start.time, end.time, compression, message.size, batch.size,
> > total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
> > [2014-01-29 23:21:16,720] WARN Property reconnect.interval is not
> > valid
> > (kafka.utils.VerifiableProperties)
> > [2014-01-29 23:21:16,825] WARN Property reconnect.interval is not
> > valid
> > (kafka.utils.VerifiableProperties)
> > [2014-01-29 23:21:16,830] WARN Property reconnect.interval is not
> > valid
> > (kafka.utils.VerifiableProperties)
> > [2014-01-29 23:21:16,831] WARN Property reconnect.interval is not
> > valid
> > (kafka.utils.VerifiableProperties)
> > [2014-01-29 23:21:16,839] WARN Property reconnect.interval is not
> > valid
> > (kafka.utils.VerifiableProperties)
> > [2014-01-29 23:21:16,841] WARN Property reconnect.interval is not
> > valid
> > (kafka.utils.VerifiableProperties)
> > [2014-01-29 23:21:16,847] WARN Property reconnect.interval is not
> > valid
> > (kafka.utils.VerifiableProperties)
> > [2014-01-29 23:21:16,858] WARN Property reconnect.interval is not
> > valid
> > (kafka.utils.VerifiableProperties)
> > [2014-01-29 23:21:16,862] WARN Property reconnect.interval is not
> > valid
> > (kafka.utils.VerifiableProperties)
> > [2014-01-29 23:21:16,867] WARN Property reconnect.interval is not
> > valid
> > (kafka.utils.VerifiableProperties)
> > [2014-01-29 23:32:03,830] WARN Produce request with correlation id
> > 11467 failed due to [c12,2]: kafka.common.RequestTimedOutException
> > (kafka.producer.async.DefaultEventHandler)
> > [2014-01-29 23:32:03,831] WARN Produce request with correlation id
> > 11859 failed due to [c12,8]: kafka.common.RequestTimedOutException
> > (kafka.producer.async.DefaultEventHandler)
> > [2014-01-29 23:32:03,831] WARN Failed to send producer request with
> > correlation id 11819 to broker 0 with data for partitions [c12,8]
> > (kafka.producer.async.DefaultEventHandler)
> > java.net.SocketTimeoutException
> > [2014-01-29 23:32:03,834] WARN Failed to send producer request with
> > correlation id 11315 to broker 0 with data for partitions [c12,6]
> > (kafka.producer.async.DefaultEventHandler)
> > java.net.SocketTimeoutException
> > [2014-01-29 23:32:03,834] WARN Failed to send producer request with
> > correlation id 11191 to broker 0 with data for partitions [c12,4]
> > (kafka.producer.async.DefaultEventHandler)
> > java.net.SocketTimeoutException
> > [2014-01-29 23:32:03,834] WARN Failed to send producer request with
> > correlation id 11791 to broker 0 with data for partitions [c12,4]
> > (kafka.producer.async.DefaultEventHandler)
> > java.net.SocketTimeoutException
> > [2014-01-29 23:32:03,834] WARN Failed to send producer request with
> > correlation id 11395 to broker 0 with data for partitions [c12,6]
> > (kafka.producer.async.DefaultEventHandler)
> > java.net.SocketTimeoutException
> > [2014-01-29 23:32:03,834] WARN Failed to send producer request with
> > correlation id 11631 to broker 0 with data for partitions [c12,4]
> > (kafka.producer.async.DefaultEventHandler)
> > java.net.SocketTimeoutException
> > [2014-01-29 23:32:03,834] WARN Failed to send producer request with
> > correlation id 10563 to broker 0 with data for partitions [c12,0]
> > (kafka.producer.async.DefaultEventHandler)
> > java.net.SocketTimeoutException
> > [2014-01-29 23:32:03,834] WARN Failed to send producer request with
> > correlation id 10907 to broker 0 with data for partitions [c12,2]
> > (kafka.producer.async.DefaultEventHandler)
> > java.net.SocketTimeoutException
> > 2014-01-29 23:21:16:562, 2014-01-29 23:40:15:886, 0, 1024, 200, 97.66,
> > 0.0857, 100000, 87.7713
> >
> > The test result is consistent and reproducible in all deployments:
> > numbers can vary but changing acks setting consistently reduces Kafka
> > throughput
> > 4-10 times.
> >
> > Is it expected system behavior? Any tuning options to resolve the
> problem?
> >
> > Thank you,
> > Michael Popov
> >
> >
>

Reply via email to