Hello dear Flinkers, If this kind of question was asked on the groups, I'm sorry for a duplicate. Feel free to just point me to the thread. I have to solve a probably pretty common case of joining a datastream to a dataset. Let's say I have the following setup: * I have a high pace stream of events coming in Kafka. * I have some dimension tables stored in Hive. These tables are changed daily. I can keep a snapshot for each day.
Now conceptually, I would like to join the stream of incoming events to the dimension tables (simple hash join). we can consider two cases: 1) simpler, where I join the stream with the most recent version of the dictionaries. (So the result is accepted to be nondeterministic if the job is retried). 2) more advanced, where I would like to do temporal join of the stream with dictionaries snapshots that were valid at the time of the event. (This result should be deterministic). The end goal is to do aggregation of that joined stream, store results in Hive or more real-time analytical store (Druid). Now, could you please help me understand is any of these cases implementable with declarative Table/SQL API? With use of temporal joins, catalogs, Hive integration, JDBC connectors, or whatever beta features there are now. (I've read quite a lot of Flink docs about each of those, but I have a problem to compile this information in the final design.) Could you please help me understand how these components should cooperate? If that is impossible with Table API, can we come up with the easiest implementation using Datastream API ? Thanks a lot for any help! Krzysztof