Hi everyone,

I noticed that a simple INNER JOIN in Flink SQL behaves
non-deterministicly. I'd like to understand if it's expected and whether an
issue is created to address it.


In my example, I have the following query:

SELECT a.funder, a.amounts_added, r.amounts_removed FROM table_a AS a JOIN
table_b AS r ON a.funder = r.funder

Let's say I have three records with funder 12345 in the table_a and a
single record with funder 12345 in the table_b. When I run this Flink job,
I can see an INSERT with two UPDATEs as my results (corresponding to the
records from table_a), but their order is not deterministic. If I re-run
the application several times, I can see different results.

It looks like Flink uses a GlobalPartitioner in this case, which tells me
that it doesn't perform a shuffle on the column used in the join condition.


I use Flink 1.17.1. Appreciate any insights here!

Reply via email to