Flink SQL Streaming Join Creates Duplicates

2020-08-27 Thread Austin Cawley-Edwards
Hey all, I've got a Flink 1.10 Streaming SQL job using the Blink Planner that is reading from a few CSV files and joins some records across them into a couple of data streams (yes, this could be a batch job won't get into why we chose streams unless it's relevant). These joins are producing some d

Re: Flink SQL Streaming Join Creates Duplicates

2020-08-27 Thread Austin Cawley-Edwards
oops, the example query should actually be: SELECT table_1.a, table_1.b, table_2.c FROM table_1 LEFT OUTER JOIN table_2 ON table_1.b = table_2.b; and duplicate results should actually be: Record(a = "data a 1", b = "data b 1", c = "data c 1") Record(a = "data a 1", b = "data b 1", c = null) Reco

Re: Flink SQL Streaming Join Creates Duplicates

2020-08-27 Thread Austin Cawley-Edwards
Ah, I think the "Result Updating" is what got me -- INNER joins do the job! On Thu, Aug 27, 2020 at 3:38 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > oops, the example query should actually be: > > SELECT table_1.a, table_1.b, table_2.c > FROM table_1 > LEFT OUTER JOIN table_2 ON

Re: Flink SQL Streaming Join Creates Duplicates

2020-08-31 Thread Arvid Heise
Hi Austin, Do I assume correctly, that you self-answered your question? If not, could you please update your current progress? Best, Arvid On Thu, Aug 27, 2020 at 11:41 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Ah, I think the "Result Updating" is what got me -- INNER joins

Re: Flink SQL Streaming Join Creates Duplicates

2020-08-31 Thread Austin Cawley-Edwards
Hey Arvid, Yes, I was able to self-answer this one. Was just confused on the non-deterministic behavior of the FULL OUTER join statement. Thinking through it and took a harder read through the Dynamic Tables doc section[1] where "Result Updating" is hinted at, and the behavior makes total sense in