I think what you need is http://kafka.apache.org/documentation/#consumerconfigs_isolation.level .
The isolation.level setting's default value is read_uncommitted. So, maybe you do not use the default setting? 赵一旦 <hinobl...@gmail.com> 于2021年1月5日周二 下午9:10写道: > I do not have this problem, so I guess it is related with the config of > your kafka producer and consumer, and maybe kafka topic properties or kafka > server properties also. > > Arvid Heise <ar...@ververica.com> 于2021年1月5日周二 下午6:47写道: > >> Hi Daniel, >> >> Flink commits transactions on checkpoints while Kafka Streams/connect >> usually commits on record. This is the typical tradeoff between Throughput >> and Latency. By decreasing the checkpoint interval in Flink, you can reach >> comparable latency to Kafka Streams. >> >> If you have two exactly once jobs, the second job may only read data that >> has been committed (not dirty as Chesnay said). If the second job were to >> consume data that is uncommitted, it will result in duplicates, in case the >> first job fails and rolls back. >> >> You can configure the read behavior with isolation.level. If you want to >> implement exactly once behavior, you also need to set that level in your >> other Kafka consumers [1]. Also compare what Kafka Streams is setting if >> you want to go exactly once [2]. >> >> If you really want low latency, please also double-check if you really >> need exactly once. >> >> [1] >> https://kafka.apache.org/documentation/#consumerconfigs_isolation.level >> [2] >> https://kafka.apache.org/documentation/streams/developer-guide/config-streams.html#processing-guarantee >> >> On Mon, Dec 28, 2020 at 12:22 PM Chesnay Schepler <ches...@apache.org> >> wrote: >> >>> I don't particularly know the our Kafka connector, but it sounds like a >>> matter of whether a given consumer does dirty reads. >>> Flink does not, whereas the other tools you're using do. >>> >>> On 12/28/2020 7:57 AM, Daniel Peled wrote: >>> >>> Hello, >>> >>> We have 2 flink jobs that communicate with each other through a KAFKA >>> topic. >>> Both jobs use checkpoints with EXACTLY ONCE semantic. >>> >>> We have seen the following behaviour and we want to make sure and ask if >>> this is the expected behaviour or maybe it is a bug. >>> >>> When the first job produces a message to KAFKA, the message is consumed >>> by the second job with a latency that depends on the *first* job >>> *checkpoint >>> interval*. >>> >>> We are able to read the message using the kafka tool or using another >>> kafka consumer, but NOT with a flink kafka consumer that again depends on >>> the checkpoint interval of the first job. >>> >>> How come the consumer of the second job depends on the producer >>> transaction commit time of the first job ? >>> >>> BR, >>> Danny >>> >>> >>> >> >> -- >> >> Arvid Heise | Senior Java Developer >> >> <https://www.ververica.com/> >> >> Follow us @VervericaData >> >> -- >> >> Join Flink Forward <https://flink-forward.org/> - The Apache Flink >> Conference >> >> Stream Processing | Event Driven | Real Time >> >> -- >> >> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany >> >> -- >> Ververica GmbH >> Registered at Amtsgericht Charlottenburg: HRB 158244 B >> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji >> (Toni) Cheng >> >