Hi Osh,

You can certainly apply multiple reduce function on a DataSet, however, you
should make sure that the data is only partitioned and sorted once.
Moreover, you would end up with multiple data sets that you need to join
afterwards.

I think the easier approach is to wrap your functions in a single
ReduceFunction.
However, you should be aware that the return type of that function needs to
be correctly defined. For example you could use the Row type.

An alternative could also be Flink SQL which supports user-defined scalar
and aggregation functions.
If you can express your logic in these UDFs, it might be much easier
because the optimizer will code generate the dynamic parts for you.

Best, Fabian


2018-06-28 5:23 GMT+02:00 Zhijiang(wangzhijiang999) <
wangzhijiang...@aliyun.com>:

> Hi Osh,
>
> As I know, currently one dataset source can not be consumed by several
> different vertexs and from the API you can not construct the topology for
> your request.
> I think your way to merge different reduce functions into one UDF is
> feasible. Maybe someone has better solution. :)
>
> zhijiang
>
> ------------------------------------------------------------------
> 发件人:Osian Hedd Hughes <os...@osian.me.uk>
> 发送时间:2018年6月28日(星期四) 00:35
> 收件人:user <user@flink.apache.org>
> 主 题:DataSet with Multiple reduce Actions
>
> Hi,
>
> I am new to Flink, and I'd like to firstly use it to perform some in
> memory aggregation in batch mode (in some months this will be migrated to
> permanent streaming, hence the choice of Flink).
>
> For this, I can successfully create the complex key that I require using
> KeySelector & returning a hash of the set of fields to "groupBy".
> I can also get the data from file/db, but now I want to be able to perform
> many different reduce functions on different fields (not hardcoded, but
> read from configuration).
> What I'd like to know, is if this is possible out of the box?
> From my research, it seems that only a single reduce function can be
> applied to a DataSet.
> The only way I found up to now, was to create a single reducer which is a
> container for all of the reduce functions I want to apply to my data record
> and simply loop through them to apply them to each record.
>
> Is this recommended? or am I missing some basics here?
>
> Many thanks for any advice,
> Osh
>
>
>

Reply via email to