Hi,
How can I perform a reduce operation on a group of datasets using Flink?
Let's say my map function gives out n datasets: d1, d2, ... dN
Now I wish to perform my reduce operation on all the N datasets at once and
not on an individual level. The only way I figured out till now is using
the union operator first like following:
List<Dataset<X>> dataList = Arrays.asList(d1, d2, ... dN);
Dataset<X> dFinal = null;
for(Dataset<X> ds: dataList)
{
dFinal = dFinal.union(ds);
}
dFinal.groupBy(0).reduce(...);
Is there a more efficient way of doing the above task using java APIs?
GroupReduce only works on a single dataset at a time and I can't find any
other methods that take multiple datasets as an input parameter.
Thanks,
--
Ritesh Kumar Singh
https://riteshtoday.wordpress.com/