In-Memory Lookup in Flink Operators

Chirag Dewan Thu, 27 Sep 2018 21:28:52 -0700

Hi,
I saw Apache Flink User Mailing List archive. - static/dynamic lookups in flink 
streaming being discussed, and then I saw this FLIP 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+Side+Inputs+for+DataStream+API.
 
I know we havent made much progress on this topic. I still wanted to put 
forward my problem statement around this. 
I am also looking for a dynamic lookup in Flink operators. I actually want to 
pre-fetch various Data Sources, like DB, Filesystem, Cassandra etc. into 
memory. Along with that, I have to ensure a refresh of in-memory lookup table 
periodically. The period being a configurable parameter. 
This is what a map operator would look like with lookup: 
-> Load in-memory lookup - Refresh timer start-> Stream processing start-> Call 
lookup-> Use lookup result in Stream processing
-> Timer elapsed -> Reload lookup data source into in-memory table-> Continue 
processing


 My concern around these are : 
1) Possibly storing the same copy of data in every Task slots memory or state 
backend(RocksDB in my case).2) Having a dedicated refresh thread for each 
subtask instance(possibly, every Task Manager having multiple refresh thread)
Am i thinking in the right direction? Or missing something very obvious? It 
confusing.
Any leads are much appreciated. Thanks in advance.
Cheers, Chirag

In-Memory Lookup in Flink Operators

Reply via email to