Hi Dongwon,
Yes, you are right that I assume that broadcasting occurs once. This
is what I meant by "If you know the data in advance". Sorry for not
being clear. If you need to periodically broadcast new versions of the
data, then I cannot find a better solution than the one you propose
with the
Hi Kostas,
Thanks for the input!
BTW, I guess you assume that the broadcasting occurs just once for
bootstrapping, huh?
My job needs not only bootstrapping but also periodically fetching a
new version of data from some external storage.
Thanks,
Dongwon
> 2020. 9. 23. 오전 4:59, Kostas Kloudas
Hi Dongwon,
If you know the data in advance, you can always use the Yarn options
in [1] (e.g. the "yarn.ship-directories") to ship the directories with
the data you want only once to each Yarn container (i.e. TM) and then
write a udf which reads them in the open() method. This will allow the
data
Hi,
I'm using Flink broadcast state similar to what Fabian explained in [1].
One difference might be the size of the broadcasted data; the size is
around 150MB.
I've launched 32 TMs by setting
- taskmanager.numberOfTaskSlots : 6
- parallelism of the non-broadcast side : 192
Here's some