On Tue, Jul 26, 2016 at 2:15 PM, Sameer W <sam...@axiomine.com> wrote: > 1. Calling clear() on the KV state is only possible for snapshots right? Do > you control that for checkpoints too.
What do you mean with snapshots vs. checkpoints exactly? > 2. Assuming that the user has no control over the checkpoint process outside > of controlling the checkpoint interval , when is the RocksDB cleared of the > operator state for checkpoints after they are long past. It seems like there > are only two checkpoints that are really necessary to maintain, the current > one and the previous one for restore. Does Flink clean up checkpoints on a > timer? When it does clean up checkpoints does it also clean up the state > backend (I am assuming they are different). Yes, here: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/fault_tolerance.html By default, only one completed checkpoint is kept. > 3. The pre-aggregating windows was very helpful as the WindowFunction is now > passed the pre-aggregated state. For windows, are the Reduce and Fold > functions called on each element event before the window is triggered. I can > see how that would work where the pre-compute is done per element but the > actual output is emitted only when the window is fired. But that is only > possible if there are no Evictors defined on the window? Also how are the > elements fed to the Reduce/Fold function. Is it like MapReduce where even if > you are using a Iterator, in reality all the values for a key are not > buffered into memory? Which ties back to how is RocksDB is used to store a > large window state before it is triggered. If my elements are accumulating > in a window (serving a ReduceFunction) does it spill to disk (RocksDB?) when > a threshold size is reached? - The function is called before adding the element to the window KV state - Yes, only possible if no evictors are defined - The window reduce function is applied directly on the elements of the stream and then update the KvState instance (e.g. update RocksDB) - Operations with RocksDB always touch RocksDB, which takes care of spilling etc.