Hi Vasia,
You are right about the topDistance, it is the dataset which has only 1
double value. I already looked at the Aggregator and I can only get the
value of an aggregator in the next iteration. However, my problem is a bit
tricky because the topDistance controls how the newSeeds is
Hi Vasia,
Thank you very much for your explanation :). When running with small
maxIteration, the job graph that Flink executed was optimal. However, when
maxIterations was large, Flink took very long time to generate the job
graph. The actually time to execute the jobs was very fast but the time
Hi Truong,
I'm afraid what you're experiencing is to be expected. Currently, for loops
do not perform well in Flink since there is no support for caching
intermediate results yet. This has been a quite often requested feature
lately, so maybe it will be added soon :)
Until then, I suggest you try
Hi,
I have a Flink program which is similar to Kmeans algorithm. I use normal
iteration(for loop) because Flink iteration does not allow to compute the
intermediate results(in this case the topDistance) within one iteration.
The problem is that my program only runs when maxIteration is small.