Re: How to join stream and batch data in Flink?

2018-09-25 Thread Hequn Cheng
Hi vino, Thanks for sharing the link. It's a great book and I will take a look. There are kinds of join. Different joins have different semantics. From the link, I think it means the time versioned join. FLINK-9712 enrichments joins with Time Ver

Re: How to join stream and batch data in Flink?

2018-09-25 Thread vino yang
Hi Hequn, The specific content of the book does not give a right or wrong conclusion, but it illustrates this phenomenon: two streams of the same input, playing and joining at the same time, due to the order of events, the connection results are uncertain. This is because the two streams are inter

Re: How to join stream and batch data in Flink?

2018-09-25 Thread Hequn Cheng
Hi vino, There are no order problems of stream-stream join in Flink. No matter what order the elements come, stream-stream join in Flink will output results which consistent with standard SQL semantics. I haven't read the book you mentioned. For join, it doesn't guarantee output orders. You have t

Re: How to join stream and batch data in Flink?

2018-09-25 Thread vino yang
Hi Fabian, I may not have stated it here, and there is no semantic problem at the Flink implementation level. Rather, there may be “Time-dependence” here. [1] Yes, my initial answer was not to use this form of join in this scenario, but Henry said he converted the table into a stream table and as

Re: How to join stream and batch data in Flink?

2018-09-25 Thread Fabian Hueske
Hi, I don't think that using the current join implementation in the Table API / SQL will work. The non-windowed join fully materializes *both* input tables in state. This is necessary, because the join needs to be able to process updates on either side. While this is not a problem for the fixed si

Re: How to join stream and batch data in Flink?

2018-09-25 Thread vino yang
Hi Henry, 1) I don't recommend this method very much, but you said that you expect to convert mysql table to stream and then to flink table. Under this premise, I said that you can do this by joining two stream tables. But as you know, this join depends on the time period in which the state is sav

Re: How to join stream and batch data in Flink?

2018-09-25 Thread 徐涛
Hi Vino, I do not quite understand in some sentences below, would you please help explain it a bit more detailedly? 1. “such as setting the state retention time of one of the tables to be permanent” , as I know, the state retention time is a global config, I can not set this prop

Re: How to join stream and batch data in Flink?

2018-09-25 Thread vino yang
Hi Henry, If you have converted the mysql table to a flink stream table. In flink table/sql, streams and stream joins can also do this, such as setting the state retention time of one of the tables to be permanent. But when the job is just running, you may not be able to match the results, because

Re: How to join stream and batch data in Flink?

2018-09-25 Thread 徐涛
Hi Vino & Hequn, I am now using the table/sql API, if I import the mysql table as a stream then convert it into a table, it seems that it can also be a workaround for batch/streaming joining. May I ask what is the difference between the UDTF method? Does this implementation has some defe

Re: How to join stream and batch data in Flink?

2018-09-21 Thread Hequn Cheng
Hi +1 for vino's answer. Also, this kind of join will be supported in FLINK-9712 . You can check more details in the jira. Best, Hequn On Fri, Sep 21, 2018 at 4:51 PM vino yang wrote: > Hi Henry, > > There are three ways I can think of: > > 1)

Re: How to join stream and batch data in Flink?

2018-09-21 Thread vino yang
Hi Henry, There are three ways I can think of: 1) use DataStream API, implement a flatmap UDF to access dimension table; 2) use table/sql API, implement a UDTF to access dimension table; 3) customize the table/sql join API/statement's implementation (and change the physical plan) Thanks, vino.

How to join stream and batch data in Flink?

2018-09-21 Thread 徐涛
Hi All, Sometimes some “dimension table” need to be joined from the "fact table", if data are not joined before sent to Kafka. So if the data are joined in Flink, does the “dimension table” have to be import as a stream, or there are some other ways can achieve it? Thanks