Re: Weird performance on custom Hashjoin w.r.t. parallelism

Piotr Nowojski Thu, 09 Nov 2017 06:40:01 -0800

Hi,

Yes as you correctly analysed parallelism 1 was causing problems, because it 
meant that all of the records must been gathered over the network from all of 
the task managers. Keep in mind that even if you increase parallelism to “p”, 
every change in parallelism can slow down your application, because events will 
have to be redistributed, which in most cases means network transfers.


For measuring throughput you could use already defined metrics in Flink:
https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html
 
<https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html>

You can get list of vertices of your job:
http://<web-ui-url>:8081/jobs/<job-id>/vertices 
<http://<web-ui-url>:8081/jobs/%3Cjob-id%3E/vertices>
Then statistics:
http://<web-ui-url>:8081/jobs/<job-id>/vertices/<vertex-id>/metrics 
<http://<web-ui-url>:8081/jobs/%3Cjob-id%3E/vertices/:vertex-id:/metrics>

For example
http://localhost:8081/jobs/34c6f7d00cf9b3ebfff4d94ad465eb23/vertices 
<http://localhost:8081/jobs/34c6f7d00cf9b3ebfff4d94ad465eb23/vertices>
http://localhost:8081/jobs/34c6f7d00cf9b3ebfff4d94ad465eb23/vertices/3d144c2a0fc19115f5f075ba85deac26/metrics
 
<http://localhost:8081/jobs/34c6f7d00cf9b3ebfff4d94ad465eb23/vertices/3d144c2a0fc19115f5f075ba85deac26/metrics>

You can also try to aggregate them:
https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html#rest-api-integration
 
<https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html#rest-api-integration>

Piotrek

> On 9 Nov 2017, at 07:53, m@xi <makisnt...@gmail.com> wrote:
> 
> Hello!
> 
> I found out that the cause of the problem was the map that I have after the
> parallel join with parallelism 1.
> When I changed it to .map(new MyMapMeter).setParallelism(p) then when I
> increase the number of parallelism p the completion time decreases, which is
> reasonable. Somehow it was a bottleneck of my parallel execution plan, but I
> had it this way in order to measure a valid average throughput.
> 
> So, my question is the following: 
> 
> How can I measure the average throughput of my parallel join operation
> properly?
> 
> Best,
> Max
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Weird performance on custom Hashjoin w.r.t. parallelism

Reply via email to