Different partitioning between new producer and old producer

Bae, Jae Hyeon Wed, 17 Sep 2014 11:01:09 -0700

The major motivation of adopting new producer before it's released, old
producer is showing terrible throughput of cross-regional kafka mirroring
in EC2.


Let me share numbers.

Using iperf, network bandwidth between us-west-2 AWS EC2 and us-east-1 AWS
EC2 is more than 40 MB/sec. But old producer's throughput is less than 3
MB/sec.

start.timeend.timecompressionmessage.sizebatch.sizetotal.data.sent.in.MB
MB.sectotal.data.sent.in.nMsgnMsg.sec2014-09-16 20:22:25:5372014-09-16
20:24:13:13823000200286.102.6589100000929.3594

Even though we increased the socket send buffer on the producer side and
recv buffer on the broker side, it didn't work.
send.buffer.bytes: 8388608
start.timeend.timecompressionmessage.sizebatch.sizetotal.data.sent.in.MB
MB.sectotal.data.sent.in.nMsgnMsg.sec2014-09-16 20:48:49:5882014-09-16
20:50:03:00623000200286.103.89691000001362.0638

But new producer which is not released yet is showing significant
performance improvement. Its performance is more than 30MB/sec.
start.timeend.timecompressionmessage.sizebatch.sizetotal.data.sent.in.MB
MB.sectotal.data.sent.in.nMsgnMsg.sec2014-09-16 20:50:31:7202014-09-16
20:50:41:24123000200286.1030.049610000010503.098
I was excited about new producer's performance but its partitioning logic
is different.

Without partition number in ProducerRecord, its partitioning logic is based
on murmur2 hash key. But in the old partitioner, partitioning logic is
based on key.hashCode.

Could you make them same logic? Otherwise, I have to change implementation
of kafka producer container.

Different partitioning between new producer and old producer

Reply via email to