Hi,

What is the best practice recommendation for the following use case? We need to 
match a stream against a set of “rules”, which are essentially a Flink DataSet 
concept. Updates to this “rules set" are possible but not frequent. Each stream 
event must be checked against all the records in “rules set”, and each match 
produces one or more events into a sink. Number of records in a rule set are in 
the 6 digit range.

Currently we're simply loading rules into a local List of rules and using 
flatMap over an incoming DataStream. Inside flatMap, we're just iterating over 
a list comparing each event to each rule.

To speed up the iteration, we can also split the list into several batches, 
essentially creating a list of lists, and creating a separate thread to iterate 
over each sub-list (using Futures in either Java or Scala).

Questions:
1.            Is there a better way to do this kind of a join?
2.            If not, is it safe to add additional parallelism by creating new 
threads inside each flatMap operation, on top of what Flink is already doing?

Thanks in advance!
Turar

Reply via email to