Looping in Dawid who can hopefully answer your questions.

On 11/01/2022 13:00, Krzysztof Chmielewski wrote:
Hi,
Im reading docs and FLIP-140 available for BATCH mode [1][2] where it reads that " In |BATCH| mode, the configured state backend is ignored. Instead, the input of a keyed operation is grouped by key (using sorting) and then we process all records of a key in turn."  [1]

I would like to ask:
1. Where (Heap, OffHeap) Flink keeps records for BATCH Streams if the configured  state backed  is ignored. In FLIP-140 i see there was a new State implementation created, that is prepared to keep only one key value, but there is no information "where" regarding memory it is kept.

2. Where Sorting algorithm keeps it intermediate results?
How/Who knows that there will be no more records for given key?

If I get it right, sorting is done through ExternalSorter class. Is there any documentation or usage example for ExternalSorter and description about SortStege like READ, SORT, SPILL?

Regards,
Krzysztof Chmielewski


[1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/ [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-140%3A+Introduce+batch-style+execution+for+bounded+keyed+streams#FLIP140:Introducebatchstyleexecutionforboundedkeyedstreams-Howtosort/groupkeys

Reply via email to