Re: Join a datastream with tables stored in Hive

2020-12-01 Thread Leonard Xu
Hi, Krzysztof > * I have a high pace stream of events coming in Kafka. > * I have some dimension tables stored in Hive. These tables are changed > daily. I can keep a snapshot for each day. For this use case, Flink supports temporal join the latest hive partition as temporal table now,

Re: Join a datastream with tables stored in Hive

2020-12-01 Thread Leonard Xu
Hi, Maciej > > I didn't find a SQL solution to this problem. > Now Flink provides the SQL solution, you can see the doc[1], the Flink-1.12 document link that posted by Chesnay should have updated but not..., I’ll check the document of 1.12. Best, Leonard [1]

Re: Join a datastream with tables stored in Hive

2020-12-01 Thread Maciej Bryński
Hi, There is an implementation only for temporal tables which needs some Java/Scala coding (no SQL-only implementation). On the same page there is annotation: Attention Flink does not support event time temporal table joins currently. So this is the reason, I'm asking this question. My use case:

Re: Join a datastream with tables stored in Hive

2020-12-01 Thread Chesnay Schepler
According to the documentation this is already implemented. On 12/1/2020 3:53 PM, maverick wrote: Hi Kurt, Is there any Jira task for tracking progress of adding event time

Re: Join a datastream with tables stored in Hive

2020-12-01 Thread maverick
Hi Kurt, Is there any Jira task for tracking progress of adding event time support to temporal joins ? Regards, Maciek -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Join a datastream with tables stored in Hive

2019-12-16 Thread Kurt Young
Great, looking forward to hearing from you again. Best, Kurt On Mon, Dec 16, 2019 at 10:22 PM Krzysztof Zarzycki wrote: > Thanks Kurt for your answers. > > Summing up, I feel like the option 1 (i.e. join with temporal table > function) requires some coding around a source, that needs to pull

Re: Join a datastream with tables stored in Hive

2019-12-16 Thread Krzysztof Zarzycki
Thanks Kurt for your answers. Summing up, I feel like the option 1 (i.e. join with temporal table function) requires some coding around a source, that needs to pull data once a day. But otherwise, bring the following benefits: * I don't have to put dicts in another store like Hbase. All stays in

Re: Join a datastream with tables stored in Hive

2019-12-15 Thread Kurt Young
Hi Krzysztof, thanks for the discussion, you raised lots of good questions, I will try to reply them one by one. Re option 1: > Question 1: do I need to write that Hive source or can I use something ready, like Hive catalog integration? Or maybe reuse e.g. HiveTableSource class? I'm not sure if

Re: Join a datastream with tables stored in Hive

2019-12-13 Thread Krzysztof Zarzycki
Very interesting, Kurt! Yes, I also imagined it's rather a very common case. In my company we currently have 3 clients wanting this functionality. I also just realized this slight difference between Temporal Join and Temporal Table Function Join, that there are actually two methods:) Regarding

Re: Join a datastream with tables stored in Hive

2019-12-13 Thread Kurt Young
Sorry I forgot to paste the reference url. Best, Kurt [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/joins.html#join-with-a-temporal-table-function [2] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/joins.html#join-with-a-temporal-table

Re: Join a datastream with tables stored in Hive

2019-12-13 Thread Kurt Young
Hi Krzysztof, What you raised also interested us a lot to achieve in Flink. Unfortunately, there is no in place solution in Table/SQL API yet, but you have 2 options which are both close to this thus need some modifications. 1. The first one is use temporal table function [1]. It needs you to