Hello,
I am using StreamingFileSink, KafkaConsumer010 as a Kafka -> S3 connector
(Flink 1.8.1, Kafka 0.10.1).

The setup is simple:
Data is written first bucketed by datetime (granularity of 1 day), then by
kafka partition.
I am using *event time* (Kafka timestamp, recorded at the time of creation
at the producer) to do the bucketing.

E.g:
s3://some_topic/dt=2019-10-21/partition_0/part-0-0

Suppose a Kafka record timestamped 10/23 23:57 was written out-of-order
(relative to its timestamp) due to network delay. In the Kafka partition,
it is next to messages timestamped 10/24 00:01

Will Flink correctly write this record to bucket:
dt=2019-10-23

Or will it be written to bucket 2019-10-24? To phrase this question a bit
differently, Does flink know when to "close" bucket dt=2019-10-23, and move
on to the next datetime bucket? Or is Flink able to write to arbitrary
datetime buckets as messages are read out of order with respect to their
Kafka timestamps?

 What is the delay was even longer, say 4 hours?

Reply via email to