Hi Florin,

Unfortunately, there is no design document.

The UnilateralSortMerger.java is used in the batch processing mode (not is 
streaming) and, 
in fact, the code dates some years back. I cc also Fabian as he may have more 
things to say on this.

Now for the streaming side, Flink uses 3 state-backends, in-memory (no spilling 
and mainly useful for testing),
filesystem and RocksDB (both eventually spill to disk but in different ways), 
and it also supports incremental 
checkpoints, i.e. at each checkpoint it only stores the diff between 
checkpoint[i] and checkpoint[i-1].

For more information on Flink state and state backends, checkout the latest 
talk from Stefan Richter at 
Flink Forward Berlin 2017 (https://www.youtube.com/watch?v=dWQ24wERItM 
<https://www.youtube.com/watch?v=dWQ24wERItM>) and the .

Cheers,
Kostas

> On Sep 19, 2017, at 6:00 PM, Florin Dinu <florin.d...@epfl.ch> wrote:
> 
> Hello everyone,
> 
> In our group at EPFL we're doing research on understanding and potentially 
> improving the performance of data-parallel frameworks that use secondary 
> storage.
> I was looking at the Flink code to understand how spilling to disk actually 
> works.
> So far I got to the UnilateralSortMerger.java and its spill and reading 
> threads. I also saw there are some spilling markers used.
> I am curious if there is any design document available on this topic.
> I was not able to find much online.
> If there is no such design document I would appreciate if someone could help 
> me understand how these spilling markers are used.
> At a higher level, I am trying to understand how much data does Flink spill 
> to disk after it has concluded that it needs to spill to disk.
> 
> Thank you very much
> Florin Dinu

Reply via email to