Hi Martin!
You can use a broadcast join for that as well. You use it exactly like the
usual join, but you write "joinWithTiny" or "joinWithLarge", depending on
whether the data set that is the argument to the function is the small or
the large one.
The broadcast join internally also broadcasts th
Hi Martin,
The answer of your question really depends on the DOP in which you will be
running the job and the expected selectivity (the fraction of lines with
that certain ID) in case this does not depend on "the other side" and can
be pre-filtered prior to broadcasting.
However, since Flink's op
Hej,
Up to what sizes are broadcast sets a good idea?
I have large dataset (~5 GB) and I'm only interested in lines with a
certain ID that I have in a file. The file has ~10 k entries.
I could either Join the dataset with the IDList or I could broadcast the
ID list and do the filtering in a Ma