I have a complex alg implemented using the DataSet api and by default it runs with parallel 90 for good performance. At the end I want to perform a clustering of the resulting data and to do that correctly I need to pass all the data through a single thread/process.
I read in the docs that as long as I did a global reduce using DataSet.reduceGroup(new GroupReduceFunction....) that it would force it to a single thread. Yet when I run the flow and bring it up in the ui, I see parallel 90 all the way through the dag including this one. Is there a config or feature to force the flow back to a single thread? Or should I just split this into two completely separate jobs? I'd rather not split as I would like to use flinks ability to iterate on this alg and cluster combo. Thank you