On Tue, Jul 26, 2016 at 2:15 PM, Sameer W <sam...@axiomine.com> wrote:
> 1. Calling clear() on the KV state is only possible for snapshots right? Do
> you control that for checkpoints too.

What do you mean with snapshots vs. checkpoints exactly?

> 2. Assuming that the user has no control over the checkpoint process outside
> of controlling the checkpoint interval , when is the RocksDB cleared of the
> operator state for checkpoints after they are long past. It seems like there
> are only two checkpoints that are really necessary to maintain, the current
> one and the previous one for restore. Does Flink clean up checkpoints on a
> timer? When it does clean up checkpoints does it also clean up the state
> backend (I am assuming they are different).

Yes, here: 
https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/fault_tolerance.html

By default, only one completed checkpoint is kept.

> 3. The pre-aggregating windows was very helpful as the WindowFunction is now
> passed the pre-aggregated state. For windows, are the Reduce and Fold
> functions called on each element event before the window is triggered. I can
> see how that would work where the pre-compute is done per element but the
> actual output is emitted only when the window is fired. But that is only
> possible if there are no Evictors defined on the window? Also how are the
> elements fed to the Reduce/Fold function. Is it like MapReduce where even if
> you are using a Iterator, in reality all the values for a key are not
> buffered into memory? Which ties back to how is RocksDB is used to store a
> large window state before it is triggered. If my elements are accumulating
> in a window (serving a ReduceFunction) does it spill to disk (RocksDB?) when
> a threshold size is reached?

- The function is called before adding the element to the window KV state
- Yes, only possible if no evictors are defined
- The window reduce function is applied directly on the elements of
the stream and then update the KvState instance (e.g. update RocksDB)
- Operations with RocksDB always touch RocksDB, which takes care of
spilling etc.

Reply via email to