So in Flink we essentially have 2 main APIs to define stream topologies: one is DataStream and the other one is Table API. My guess is that right now you're trying to use DataStream with the Kafka connector.
DataStream allows you to statically define a stream topology, with an API in a similar fashion to Java Streams or RxJava. Table API on the other hand gives you the ability to define stream jobs using SQL, where you can easily perform operations such as joins over windows. Flink is definitely able to solve your use case, with both APIs. You can also mix these two APIs in your application to solve your use case in the way you want. I suggest you start by looking at the documentation of Table API https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/overview/ and then, for your specific use case, check https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/queries/window-tvf/ . Hope it helps. FG On Fri, Jan 7, 2022 at 10:58 AM HG <[email protected]> wrote: > Hi Francesco. > > I am not using anything right now apart from Kafka. > Just need to know whether Flink is capable of doing this and trying to > understand the documentation and terminology etc. > I grapple a bit to understand the whole picture. > > Thanks > > Regards Hans > > Op vr 7 jan. 2022 om 09:24 schreef Francesco Guardiani < > [email protected]>: > >> Hi, >> Are you using SQL or DataStream? For SQL you can use the Window TVF >> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/queries/window-tvf/> >> feature, where the window size is the "max" elapsed time, and then inside >> the window you pick the beginning and end event and join them. >> >> Hope it helps, >> FG >> >> On Thu, Jan 6, 2022 at 3:25 PM HG <[email protected]> wrote: >> >>> Hello all, >>> >>> My question is basically whether it is possible to group events by a key >>> (these will belong to a specific transaction) and then calculate the >>> elapsed times between them based on a timestamp that is present in the >>> event. >>> So a transaction my have x events all timestamped and with the >>> transaction_id as key. >>> Is it possible to >>> 1. group them by the key >>> 2. order by the timestamp, >>> 3. calculate the elapsed times between the steps/event >>> 4. add that elapsed time to the step/event >>> 5. output the modified events to the sink >>> >>> >>> >>> Regards Hans >>> >>
