I have a complex alg implemented using the DataSet api and by default it
runs with parallel 90 for good performance. At the end I want to perform a
clustering of the resulting data and to do that correctly I need to pass
all the data through a single thread/process.

I read in the docs that as long as I did a global reduce using
DataSet.reduceGroup(new GroupReduceFunction....) that it would force it to
a single thread.  Yet when I run the flow and bring it up in the ui, I see
parallel 90 all the way through the dag including this one.

Is there a config or feature to force the flow back to a single thread?  Or
should I just split this into two completely separate jobs?  I'd rather not
split as I would like to use flinks ability to iterate on this alg and
cluster combo.

Thank you

Reply via email to