Hi Garrett,

In the Web UI, when viewing a job under overview / subtasks, selecting the 
checkbox "Aggregate task statistics by TaskManager” will reduce the number of 
displayed rows (though in your case only by half).

The following documents profiling a Flink job with Java Flight Recorder: 

https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/application_profiling.html#profiling-with-java-flight-recorder

Are your functions allocating Java collections? This is a common cause of poor 
performance. Also, Flink types are much faster than Kryo / GenericType.

A JobManager running hundreds of TaskManagers / TaskManager slots may require 
more than 5120 MB of heap. I’ve experienced very poor performance when this 
memory is too low. On the other hand, a TaskManager allocation of 9200 MB seems 
much too high for 2 slots when user functions are memory bound. If your data 
exceeds memory then it will be spilled to disk no matter how high the TM 
allocation so you are better off allowing the OS to manager the spilled data 
and prefetch.

The Gelly algorithms process trillions of records per hour on a system of your 
scale so Flink is certainly capable of achieving significantly better 
throughput.

Greg


> On Dec 5, 2017, at 4:03 PM, Garrett Barton <garrett.bar...@gmail.com> wrote:
> 
> I have been moving some old MR and hive workflows into Flink because I'm 
> enjoying the api's and the ease of development is wonderful.  Things have 
> largely worked great until I tried to really scale some of the jobs recently.
> 
> I have for example one etl job that reads in about 12B records at a time and 
> does a sort, some simple transformations, validation, a re-partition and then 
> output to a hive table.
> When I built it with the sample set, ~200M, it worked great, took maybe a 
> minute and blew threw it.
> 
> What I have observed is there is some kind of saturation reached depending on 
> number of slots, number of nodes and the overall size of data to move.  When 
> I run the 12B set, the first 1B go through in under 1 minute, really really 
> fast.  But its an extremely sharp drop off after that, the next 1B might take 
> 15 minutes, and then if I wait for the next 1B, its well over an hour.
> 
> What I cant find is any obvious indicators or things to look at, everything 
> just grinds to a halt, I don't think the job would ever actually complete.
> 
> Is there something in the design of flink in batch mode that is perhaps 
> memory bound?  Adding more nodes/tasks does not fix it, just gets me a little 
> further along.  I'm already running around ~1,400 slots at this point, I'd 
> postulate needing 10,000+ to potentially make the job run, but thats too much 
> of my cluster gone, and I have yet to get flink to be stable past 1,500.
> 
> Any idea's on where to look, or what to debug?  GUI is also very cumbersome 
> to use at this slot count too, so other measurement ideas are welcome too!
> 
> Thank you all.

Reply via email to