Hi Senthil,
I think you are right that you cannot update closure variables
directly and expect them to show up at the workers.
If the variable values are read from S3 files, I think currently you
will need to define a source explicitly to read the latest value of the file.
Whether to use BroadcastedStream should depends on how you want to access the
set of string: if you want to broadcast the same strings to all the tasks, then
broadcast stream is the solution and if you want to distribute the set of
strings in other methods, you could also use more generic connect streams like:
streamA.connect(streamB.keyBy()).process(xx). [1]
Best,
Yun
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/#datastream-transformations
------------------------------------------------------------------
From:Senthil Kumar <[email protected]>
Send Time:2020 Apr. 27 (Mon.) 21:51
To:[email protected] <[email protected]>
Subject:Updating Closure Variables
Hello Flink Community!
We have a flink streaming application with a particular use case where a
closure variable Set<String> is used in a filter function.
Currently, the variable is set at startup time.
It’s populated from an S3 location, where several files exist (we consume the
one with the last updated timestamp).
Is it possible to periodically update (say once every 24 hours) this closure
variable?
My initial research indicates that we cannot update closure variables and
expect them to show up at the workers.
There seems to be something called BrodcastStream in Flink.
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html
Is that the right approach? I would like some kind of a confirmation before I
go deeper into it.
cheers
Kumar