Hi,

We have big image data(about 20 MB each) coming in at high frequency/volume
from a video stream from many cameras.

The current
design thought is to store this data in the 1st step of the Flink Dataflow
in EFS(NAS) and access the EFS data from the 3rd step in the dataflow(may
be in a totally diffferent TaskManager node) without using
RocksDbStateBackend (aka slow Hadoop version1 pattern which Spark solved
with in-memory computation).
https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html#the-rocksdbstatebackend

1. Can we use RocksDbStateBackend configured with
file:///efsendpoint/checkpoints to store this image data in EFS and access
it from the 3rd step ?
2. Does the checkpointing interval need to be < than the time it takes to
get to Step 3 after storing data in EFS in step 1 ? Will this allow Step3
across a different TaskManager node to get to the data stored in EFS via
RockDBStateBackend assuming Local Task storage is set ?
3. Can I use the Metrics tab of the Flink dashboard to see how long each
step in the dataflow pipeline/graph takes ?

TIA,
Vijay

Reply via email to