Hi,
I have a use case where 4 streams get merged (union) and grouped on common
key (keyBy) and a custom KeyedProcessFunction is called. Now I need to keep
state (RocksDB backend) for all 4 streams in my custom KeyedProcessFunction
where each of these 4 streams would be stored as map. So I have 2 options

1. Create a separate MapStateDescriptor for each of these streams and store
their events separately.
2. Create a single MapStateDescriptor where there will be only 4 keys
(corresponding to 4 stream types) and value will be of type Map which
further keep events from respective streams.

I want to understand from performance perspective, would there be any
difference in above approaches. Will keeping 4 different MapState cause 4
lookups for RocksDB backend when they are accessed? Or all of these
MapStates are internally stored within RocksDB in single row corresponding
to respective key (as per keyedStream) and hence they are all fetched in
single call before operator's processElement is called? If there are
different lookups in RocksDB for each of MapStateDescriptor, then I think
keeping them in single MapStateDescriptor would be more efficient minimize
RocksDB calls? Please advise.

Gagan

Reply via email to