[ https://issues.apache.org/jira/browse/KAFKA-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jun Rao resolved KAFKA-14156. ----------------------------- Assignee: Artem Livshits Resolution: Fixed Merged the PR to 3.3 and trunk. > Built-in partitioner may create suboptimal batches with large linger.ms > ----------------------------------------------------------------------- > > Key: KAFKA-14156 > URL: https://issues.apache.org/jira/browse/KAFKA-14156 > Project: Kafka > Issue Type: Bug > Components: producer > Affects Versions: 3.3.0 > Reporter: Artem Livshits > Assignee: Artem Livshits > Priority: Blocker > Fix For: 3.3.0 > > > The new built-in "sticky" partitioner switches partitions based on the amount > of bytes produced to a partition. It doesn't use batch creation as a switch > trigger. The previous "sticky" DefaultPartitioner switched partition when a > new batch was created and with small linger.ms (default is 0) could result in > sending larger batches to slower brokers potentially overloading them. See > https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner > for more detail. > However, the with large linger.ms, the new built-in partitioner may create > suboptimal batches. Let's consider an example, suppose linger.ms=500, > batch.size=16KB (default) and we produce 24KB / sec, i.e. every 500ms we > produce 12KB worth of data. The new built-in partitioner would switch > partition on every 16KB, so we could get into the following batching pattern: > * produce 12KB to one partition in 500ms, hit linger, send 12KB batch > * produce 4KB more to the same partition, now we've produced 16KB of data, > switch partition > * produce 12KB to the second partition in 500ms, hit linger, send 12KB batch > * in the mean time the 4KB produced to the first partition would hit linger > as well, sending 4KB batch > * produce 4KB more to the second partition, now we've produced 16KB of data > to the second partition, switch to 3rd partition > so in this scenario the new built-in partitioner produces a mix of 12KB and > 4KB batches, while the previous DefaultPartitioner would produce only 12KB > batches -- it switches on new batch creation, so there is no "mid-linger" > leftover batches. > To avoid creation of batch fragmentation on partition switch, we can wait > until the batch is ready before switching the partition, i.e. the condition > to switch to a new partition would be "produced batch.size bytes" AND "batch > is not lingering". This may potentially introduce some non-uniformity into > data distribution, but unlike the previous DefaultPartitioner, the > non-uniformity would not be based on broker performance and won't > re-introduce the bad pattern of sending more data to slower brokers. -- This message was sent by Atlassian Jira (v8.20.10#820010)