Hi Kant,

I am not an expert on Flink's SQL implementation. Hence, I'm pulling in
Timo and Jark who might help you with your question.

Cheers,
Till

On Tue, Jan 28, 2020 at 10:46 PM kant kodali <kanth...@gmail.com> wrote:

> Sorry. fixed some typos.
>
> I am doing a streaming outer join from four topics in Kafka lets call them
> sample1, sample2, sample3, sample4. Each of these test topics has just one
> column which is of tuple string. my query is this
>
> SELECT * FROM sample1 FULL OUTER JOIN sample2 on sample1.f0=sample2.f0 FULL 
> OUTER JOIN sample3 on sample2.f0=sample3.f0 FULL OUTER JOIN sample4 on 
> sample3.f0=sample4.f0
>
>
> And here is how I send messages to those Kafka topics at various times.
>
> At time t1 Send a message "flink" to test-topic1
>
> (true,flink,null,null,null) // Looks good
>
> At time t2 Send a message "flink" to test-topic4
>
> (true,null,null,null,flink) // Looks good
>
> At time t3 Send a message "flink" to test-topic3
>
> (false,null,null,null,flink) // Looks good
> (true,null,null,flink,flink) //Looks good
>
> At time t4 Send a message "flink" to test-topic2
>
> (false,flink,null,null,null) // Looks good
> (false,null,null,flink,flink) // Looks good
> *(true,null,null,null,flink) // Redundant?*
> *(false,null,null,null,flink) // Redundant?*
> (true,flink,flink,flink,flink) //Looks good
>
> Assume t1<t2<t3<t4
>
> Those two rows above seem to be redundant to me although the end result is
> correct. Doesn't see the same behavior if I join two topics. These
> redundant messages can lead to a lot of database operations underneath so
> any way to optimize this? I am using Flink 1.9 so not sure if this is
> already fixed in 1.10.
>
> Attached the code as well.
>
> Thanks!
> kant
>
>
> On Tue, Jan 28, 2020 at 1:43 PM kant kodali <kanth...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am doing a streaming outer join from four topics in Kafka lets call
>> them sample1, sample2, sample3, sample4. Each of these test topics has just
>> one column which is of tuple string. my query is this
>>
>> SELECT * FROM sample1 FULL OUTER JOIN sample2 on sample1.f0=sample2.f0 FULL 
>> OUTER JOIN sample3 on sample2.f0=sample3.f0 FULL OUTER JOIN sample4 on 
>> sample3.f0=sample4.f0
>>
>>
>> And here is how I send messages to those Kafka topics at various times.
>>
>> At time t1 Send a message "flink" to test-topic1
>>
>> (true,flink,null,null,null) // Looks good
>>
>> At time t2 Send a message "flink" to test-topic4
>>
>> (true,null,null,null,flink) // Looks good
>>
>> At time t3 Send a message "flink" to test-topic3
>>
>> (false,null,null,null,flink) // Looks good
>> (true,null,null,flink,flink) //Looks good
>>
>> At time t3 Send a message "flink" to test-topic2
>>
>> (false,flink,null,null,null) // Looks good
>> (false,null,null,flink,flink) // Looks good
>> *(true,null,null,null,flink) // Redundant?*
>> *(false,null,null,null,flink) // Redundant?*
>> (true,flink,flink,flink,flink) //Looks good
>>
>> Those two rows above seem to be redundant to be although the end result
>> is correct. Doesn't see the same behavior if I join two topics. This
>> unwanted message will lead to a lot of database operations underneath so
>> any way to optimize this? I am using Flink 1.9 so not sure if this is
>> already fixed in 1.10.
>>
>> Attached the code as well.
>>
>> Thanks!
>> kant
>>
>>
>>
>>

Reply via email to