Re: Let BucketingSink roll file on each checkpoint

Fabian Hueske Wed, 04 Jul 2018 05:19:54 -0700

Hi Xilang,

I thought about this again.
The bucketing sink would need to roll on event-time intervals (similar to
the current processing time rolling) which are triggered by watermarks in
order to support consistency.
However, it would also need to maintain a write ahead log of all received
rows and could only write those that are smaller than the received
watermark. This would make the whole sink more complex.


I'm not sure that rolling on checkpoint barriers is a good solution either.
IMO, the checkpointing interval and file rolling interval should not depend
on each other because it mixes different requirements and introduces
challenging trade-offs.

Best, Fabian


2018-07-04 11:59 GMT+02:00 XilangYan <xilang....@gmail.com>:

> Hi Fabian,
>
> We did need a consistent view of data, we need the Counter and HDFS file to
> be consistent. For example, when the Counter indicate there is 1000 message
> wrote to the HDFS, there must be exactly 1000 messages in HDFS ready for
> read.
>
> The data we write to HDFS is collected by an Agent(which also send Counter
> message to count message number received), data has a timestamp and we use
> BucktingSink to write data into different bucket.
>
> Could you give me a clue on how to achieve this with watermark. As my
> understanding, watermark is designed to process out-of-order data with a
> know delay, how it can be used to make my CounterSink and BuckingSink
> consistent.
>
> Thanks, Xilang
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>

Re: Let BucketingSink roll file on each checkpoint

Reply via email to