Weird performance on custom Hashjoin w.r.t. parallelism

2017-11-08 Thread m@xi
Hello everyone! I have implemented a custom parallel hashjoin algorithm (without windows feature) in order to calculate the join of two input streams on a common attribute using the CoFlatMap function and the state. After the join operator (which has parallelism p = #processors) operator I have a

Re: Weird performance on custom Hashjoin w.r.t. parallelism

2017-11-08 Thread m@xi
Hello! I found out that the cause of the problem was the map that I have after the parallel join with parallelism 1. When I changed it to .map(new MyMapMeter).setParallelism(p) then when I increase the number of parallelism p the completion time decreases, which is reasonable. Somehow it was a bot

Re: Weird performance on custom Hashjoin w.r.t. parallelism

2017-11-09 Thread Piotr Nowojski
Hi, Yes as you correctly analysed parallelism 1 was causing problems, because it meant that all of the records must been gathered over the network from all of the task managers. Keep in mind that even if you increase parallelism to ā€œpā€, every change in parallelism can slow down your application