Hi,

I am building a data pipeline with a lot of streaming join and over window 
aggregation. And flink SQL have these feature supported. However, there is no 
similar DataStream APIs provided(maybe there is and I didn't find them. please 
point out if there is). I got confused because I assume that the SQL logical 
plan will be translated into a graph of operators or transformations.


Could someone explain how these two sql query are  implemented or translated 
into low level code ( operators or transformations)? I am asking this because I 
have implemented these features without using SQL and the performance looks 
good. And I certainly love to migrate to SQL, but I want to understand them 
well first. Any information or hints or links are appreciated.


  1.  Time-Windowed Join

The DataStream API only provides streaming join within same window. But the SQL 
API (time-windowed join) can join two streams within quite different time 
range. Below is an sample query that listed in official doc, and we can see 
that Orders and Shipments have 4 hours difference. Is it implemented by 
CoProcessFunction or TwoInputOperator which buffers the event for a certain 
period?


SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.orderId AND
      o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime

2. Over-Window Aggregation
There is no similar feature in DataStream API. How does this get implemented? 
Does it use keyed state to buffer the previous events, and pull the records 
when there is a need? How does sorting get handled?


Best
Yan



Reply via email to