I've found this (
https://stackoverflow.com/questions/50580756/flink-window-dragged-stream-performance
)
post on StackOverflow, where someone complains about performance drop in
KeyBy.

On Tue, Jan 21, 2020 at 1:24 PM Dharani Sudharsan <
dharani.sudhar...@outlook.in> wrote:

> Hi All,
>
> Currently, I’m running a flink streaming application, the configuration
> below.
>
> Task slots: 45
> Task Managers: 3
> Job Manager: 1
> Cpu     : 20 per machine
>
> My sample code below:
>
> Process Stream:  datastream.flatmap().map().process().addsink
>
> Data size: 330GB approx.
>
> Raw Stream: datastream.keyby.window.addsink
>
> When I run the raw stream, Kafka source is reading data in GB and it is
> able to read 330GB in 15m.
>
> But when I run the Process stream, there is a back pressure noticed and
> source is reading data in MBs and there is a huge impact on the
> performance.
>
> I’m using file state backend with checkpointing enabled.
>
> I tried debugging the issues. I made some changes to the code like below.
>
>
> Datastream.keyby.timewindow.reduce.flatmap.keyby.timewindow.reduce.map.keyby.process.addsink
>
> This time, the performance was slightly improved but not good and I
> noticed memory leaks which causing Task managers to go down and job is
> getting terminated.
>
>
> Any help would be much appreciated.
>
> Thanks,
> Dharani.
>
>
>
>
>
>

Reply via email to