Hongshun Wang created FLINK-32019:
-------------------------------------

             Summary: EARLIEST offset strategy for partitions discoveried later 
based on FLIP-288
                 Key: FLINK-32019
                 URL: https://issues.apache.org/jira/browse/FLINK-32019
             Project: Flink
          Issue Type: Sub-task
          Components: Connectors / Kafka
    Affects Versions: kafka-3.0.0
            Reporter: Hongshun Wang
             Fix For: kafka-4.0.0


As described in 
[FLIP-288|https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source],
 the strategy used for new partitions is the same as the initial offset 
strategy, which is not reasonable.

According to the semantics, if the startup strategy is latest, the consumed 
data should include all data from the moment of startup, which also includes 
all messages from new created partitions. However, the latest strategy 
currently maybe used for new partitions, leading to the loss of some data 
(thinking a new partition is created and might be discovered by Kafka source 
several minutes later, and the message produced into the partition within the 
gap might be dropped if we use for example "latest" as the initial offset 
strategy).if the data from all new partitions is not read, it does not meet the 
user's expectations.

Other ploblems see final Section of 
[FLIP-288|https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source]:
 {{User specifies OffsetsInitializer for new partition}} .

Therefore, it’s better to provide an *EARLIEST* strategy for later discovered 
partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to