Hi,

1) yes with "partition" I meant "parallel instance".

If the watermarking is correct in the DataStream API. The Table API and SQL will take care that it remains correct. E.g. you can only perform a TUMBLE window if the timestamp column has not lost its time attribute property. A regular JOIN (not time-versioned) does not work with watermarks, thus, the result will not have time attributes anymore. A subsequent TUMBLE window usage will fail with an exception.

2) You don't need output. Most operators deal with watermarking logic. But for sources, you need output from all sources in order to have progress in event-time.

Regards,
Timo


On 24.03.20 12:21, Dominik Wosiński wrote:
Hey Timo,
Thanks a lot for this answer! I was mostly using the DataStream API, so that's good to know the difference.
I have followup questions then, I will be glad for clarification:

1) So, for the SQL Join operator, is the /partition /the parallel instance of operator or is it the table partitioning as defined by /partitionBy ??/ 2) Assuming that this is instance of parallel operator, does this mean that we need output from ALL operators so that the watermark progresses and the output is generated?

Best Regards,
Dom.

wt., 24 mar 2020 o 10:01 Timo Walther <twal...@apache.org <mailto:twal...@apache.org>> napisał(a):

    Hi Dominik,

    the big conceptual difference between DataStream and Table API is that
    record timestamps are part of the schema in Table API whereas they are
    attached internally to each record in DataStream API. When you call
    `y.rowtime` during a stream to table conversion, the runtime will
    extract the internal timestamp and will copy it into the field `y`.

    Even if the timestamp is not internally anymore, Flink makes sure that
    the watermarking (which still happens internally) remains valid.
    However, this means that timestamps and watermarks must already be
    correct when entering the Table API. If they were not correct before,
    they will also not trigger time-based operations correctly.

    If there is no output for a parallelism > 1, usually this means that
    one
    source parition has not emitted a watermark to have progress globally
    for the job:

    watermark of operator = min(previous operator partition 1, previous
    operator partition 2, ...)

    I hope this helps.

    Regards,
    Timo


    On 19.03.20 16:38, Dominik Wosiński wrote:
     > I have created a simple minimal reproducible example that shows
    what I
     > am talking about:
     > https://github.com/DomWos/FlinkTTF/tree/sql-ttf
     >
     > It contains a test that shows that even if the output is in order
    which
     > is enforced by multiple sleeps, then for parallelism > 1 there is no
     > output and for parallelism == 1, the output is produced normally.
     >
     > Best Regards,
     > Dom.


Reply via email to