Hi,
I am building a data pipeline with a lot of streaming join and over window aggregation. And flink SQL have these feature supported. However, there is no similar DataStream APIs provided(maybe there is and I didn't find them. please point out if there is). I got confused because I assume that the SQL logical plan will be translated into a graph of operators or transformations. Could someone explain how these two sql query are implemented or translated into low level code ( operators or transformations)? I am asking this because I have implemented these features without using SQL and the performance looks good. And I certainly love to migrate to SQL, but I want to understand them well first. Any information or hints or links are appreciated. 1. Time-Windowed Join The DataStream API only provides streaming join within same window. But the SQL API (time-windowed join) can join two streams within quite different time range. Below is an sample query that listed in official doc, and we can see that Orders and Shipments have 4 hours difference. Is it implemented by CoProcessFunction or TwoInputOperator which buffers the event for a certain period? SELECT * FROM Orders o, Shipments s WHERE o.id = s.orderId AND o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime 2. Over-Window Aggregation There is no similar feature in DataStream API. How does this get implemented? Does it use keyed state to buffer the previous events, and pull the records when there is a need? How does sorting get handled? Best Yan