[ 
https://issues.apache.org/jira/browse/KAFKA-15185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

RivenSun updated KAFKA-15185:
-----------------------------
    Description: 
h2. condition:

1. Business topic adds partition
2. The configuration metadata.max.age.ms of producers and consumers is set to 
five minutes.
But the producer discovered the new partition before the consumer, and 
generated 100 messages to the new partition.
3. The consumer parameter auto.offset.reset is set to *latest*
h2. result:

Consumers will lose these 100 messages




First of all, we cannot directly set auto.offset.reset to {*}earliest{*}.
Because the user's demand is that a newly subscribed group can discard all old 
messages of the topic.
However, after the group is subscribed, the message generated by the expanded 
partition {*}must be guaranteed not to be lost{*}, similar to starting 
consumption from the earliest.
h2.  
h2. suggestion:

We have set the consumer's metadata.max.age.ms to 1/2 or 1/3 of the producer's 
metadata.max.age.ms configuration.
But this still can't solve the problem, because in many cases, the producer may 
force refresh the metadata.
Secondly, a smaller metadata.max.age.ms value will bring more metadata refresh 
requests, which will increase the burden on the broker.

So can we add a parameter to control how the consumer determines whether to 
start consumption from the earliest or latest for the newly added partition.
Perhaps during the rebalance process, the leaderConsumer needs to mark which 
partitions are newly added when calculating the assignment.

  was:
h2. condition:

1. Business topic adds partition
2. The configuration metadata.max.age.ms of producers and consumers is set to 
five minutes.
But the producer discovered the new partition before the consumer, and 
generated 100 messages to the new partition.
3. The consumer parameter auto.offset.reset is set to latest
h2. result:

Consumers will lose these 100 messages


First of all we cannot directly set auto.offset.reset to {*}earliest{*}.
Because the user's demand is that a newly subscribed group can discard all old 
messages of the topic.
However, after the group is subscribed, the message generated by the expanded 
partition must be guaranteed not to be lost, similar to starting consumption 
from the earliest.
h2. 
suggestion:

So we have set the consumer's metadata.max.age.ms to 1/2 or 1/3 of the 
producer's configuration.
But this still can't solve the problem, because in many cases, the producer may 
force refresh the metadata.
Secondly, a smaller metadata.max.age.ms value will bring more metadata refresh 
requests, which will increase the burden on the broker.

So can we add a parameter to control how the consumer determines whether to 
start consumption from the earliest or latest for the newly added partition.
Perhaps during the rebalance process, the leaderConsumer needs to mark which 
partitions are newly added when calculating the assignment.


> Consumers using the latest strategy may lose data after the topic adds 
> partitions
> ---------------------------------------------------------------------------------
>
>                 Key: KAFKA-15185
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15185
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 3.4.1
>            Reporter: RivenSun
>            Assignee: Luke Chen
>            Priority: Major
>
> h2. condition:
> 1. Business topic adds partition
> 2. The configuration metadata.max.age.ms of producers and consumers is set to 
> five minutes.
> But the producer discovered the new partition before the consumer, and 
> generated 100 messages to the new partition.
> 3. The consumer parameter auto.offset.reset is set to *latest*
> h2. result:
> Consumers will lose these 100 messages
> First of all, we cannot directly set auto.offset.reset to {*}earliest{*}.
> Because the user's demand is that a newly subscribed group can discard all 
> old messages of the topic.
> However, after the group is subscribed, the message generated by the expanded 
> partition {*}must be guaranteed not to be lost{*}, similar to starting 
> consumption from the earliest.
> h2.  
> h2. suggestion:
> We have set the consumer's metadata.max.age.ms to 1/2 or 1/3 of the 
> producer's metadata.max.age.ms configuration.
> But this still can't solve the problem, because in many cases, the producer 
> may force refresh the metadata.
> Secondly, a smaller metadata.max.age.ms value will bring more metadata 
> refresh requests, which will increase the burden on the broker.
> So can we add a parameter to control how the consumer determines whether to 
> start consumption from the earliest or latest for the newly added partition.
> Perhaps during the rebalance process, the leaderConsumer needs to mark which 
> partitions are newly added when calculating the assignment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to