Hi, I have yet another question, this time maintaining a global list of
centroids. 

I am trying to implement the clustream algorithm and for that purpose I have
the initial set of centres in a flink dataset. Now I need to update the set
of centres for every data tuple that comes from the stream. From what I have
read so far on 2 different posts having similar questions, is that, in case
of streaming datasets the co-map operator was asked to use and retrieve them
in 2 separate map functions.

My idea is to broadcast the dataset in each flink partition and whenever a
data tuple is mapped to a partition using a map function, update the
broadcasted dataset.
But as this is currently not possible, thus I was thinking to broadcast the
datastream using 

"ds.broadcast()"

so that every partition receives the streamed tuple. Then, use a normal
flatmap function for the centres and use the broadcasted tuple to update the
centres and return the updated set of centres.

My question is, would this work? If yes, may someone give an example of the
datastream broadcast function and how to retrieve the broadcasted stream in
a map function?



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Regarding-Broadcast-of-datasets-in-streaming-context-tp6456.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Reply via email to