Hi Fabian,

We have reworked our execution to remove the group reduce step and replaced
it with a map partition and we're seeing data passing more immediately now.

Thanks for your quick reply, it was very useful.

Regards,
Paul

On 26 October 2016 at 19:57, Fabian Hueske <fhue...@gmail.com> wrote:

> Hi Paul,
>
> Flink pushes the results of operators (including GroupReduce) to the next
> operator or sink as soon as they are computed. So what you are asking for
> is actually happening.
> However, before the GroupReduceFunction can be applied, the whole data is
> sorted in order to group the data. This step is usually more expensive than
> applying the GroupReduceFunction. Therefore, it looks like the output is
> batched.
> Flink does only support sort-based grouping, however also hash-based
> grouping would not help, because Flink would not know when to close a group
> until all data is consumed.
>
> Please let me know if you have further questions.
>
> Best, Fabian
>
>
> 2016-10-26 19:07 GMT+02:00 Paul Wilson <paulalexwil...@gmail.com>:
>
>> Hi,
>>
>> DataSet API
>> Flink 1.1.3
>>
>> I have an application where I'd like to perform some mapping before
>> batching the results and passing them to the sink. I'm performing a
>> 'composite' key selection to group the items by their natural key as well
>> as a batch (itemCount / batchSize). When I reduce the batches and pass them
>> to the sink, the whole flow is waiting for all reduces to complete before
>> passing them to sink.
>>
>> Is there some way that the results of a single group reduce can be passed
>> to the sink before all reduces are complete?
>>
>> Hope that makes sense,
>> Regards,
>> Paul
>>
>
>

Reply via email to