Thanks Lukasz. With the provided window function, can I control how the 
watermark move forward ? Or a customized WindowFn is required.

Sent from my iPhone

On Dec 27, 2016, at 10:40 AM, Lukasz Cwik 
<[email protected]<mailto:[email protected]>> wrote:

The withAllowedLateness controls when data can enter the system and still be 
considered valid. The timestamp of the data is always relative to the watermark.
timestamp is before watermark - withAllowedLateness -> data can be dropped
timestamp is after watermark - withAllowedLatness -> data can not be dropped

Since in your case your using event time (and not processing time), the 
watermark should not be moving forward when the source is not available.

But, when there is no data being read from the source because there are no 
records, it is up to the source to choose how the watermark advances and may 
move the watermark forward based upon some estimate of where it thinks the 
watermark should be at. Since some sources may not be able to tell whether the 
source is truly unavailable or that there is just no incoming data, it may move 
the watermark forward irregardless and thus withAllowedLateness becomes 
important again.

On Fri, Dec 23, 2016 at 10:58 AM, Xu, Mingmin 
<[email protected]<mailto:[email protected]>> wrote:
Hello,

I'm working on a POC project with Apache Beam. The rough pipeline reads from a 
checkout Kafka topic, and generate hourly summary data on different dimensions. 
I suppose a Fixed Time Window, with Time-Based Trigger could handle the case. 
-EventTime is the checkout timestamp.

However, when the job, or the source is down for some time, like several hours, 
it would have problems to run the recovery. Data will be dropped, unless I set 
a large value for withAllowedLateness, large allowedLateness+ 
accumulatingFiredPanes also leads to lots of pane data in memory. Is this the 
right way to handle a recovery scenario? Appreciate for any suggestion.

Thank you!
Mingmin


Reply via email to