The code or the execution plan (ExecutionEnvironment.getExecutionPlan()) of
the job would be interesting.

2018-08-08 10:26 GMT+02:00 Chesnay Schepler <ches...@apache.org>:

> What have you tried so far to increase performance? (Did you try different
> combinations of -yn and -ys?)
>
> Can you provide us with your application? What source/sink are you using?
>
>
> On 08.08.2018 07:59, Ravi Bhushan Ratnakar wrote:
>
> Hi Everybody,
>
> Currently I am working on a project where i need to write a Flink Batch
> Application which has to process hourly data around 400GB of compressed
> sequence file. After processing, it has write it as compressed parquet
> format in S3.
>
> I have managed to write the application in Flink and able to run
> successfully process the whole hour data and write in Parquet format in S3.
> But the problem is this that it is not able to meet the performance of the
> existing application which is written using Spark Batch(running in
> production).
>
> Current Spark Batch
> Cluster size - Aws EMR - 1 Master + 100 worker node of m4.4xlarge (
> 16vCpu, 64GB RAM), each instance with 160GB disk volume
> Input data - Around 400GB
> Time Taken to process - Around 36 mins
>
> ------------------------------------------------------------
>
> Flink Batch
> Cluster size - Aws EMR - 1 Master + 100 worker node of r4.4xlarge (
> 16vCpu, 64GB RAM), each instance with 630GB disk volume
> Transient Job -  flink run -m yarn-cluster -yn 792 -ys 2 -ytm 14000 -yjm
> 114736
> Input data - Around 400GB
> Time Taken to process - Around 1 hour
>
>
> I have given all the node memory to jobmanager just to make sure that
> there is a dedicated node for jobmanager so that it doesn't face any issue
> related to resources.
>
>
> We are already running Flink Batch job with double RAM compare to Spark
> Batch however we are not able get the same performance.
>
> Kindly suggest on this to achieve the same performance as we are getting
> from Spark Batch
>
>
> Thanks,
> Ravi
>
>
>

Reply via email to