Re: Setting source vs sink vs window parallelism with data increase

2019-03-27 Thread Piotr Nowojski
No problem and it’s good to hear that you managed to solve the problem. Piotrek > On 23 Mar 2019, at 12:49, Padarn Wilson wrote: > > Well.. it turned out I was registering millions of timers by accident, which > was why garbage collection was blowing up. Oops. Thanks for your help again. >

Re: Setting source vs sink vs window parallelism with data increase

2019-03-23 Thread Padarn Wilson
Well.. it turned out I was registering millions of timers by accident, which was why garbage collection was blowing up. Oops. Thanks for your help again. On Wed, Mar 6, 2019 at 9:44 PM Padarn Wilson wrote: > Thanks a lot for your suggestion. I’ll dig into it and update for the > mailing list if

Re: Setting source vs sink vs window parallelism with data increase

2019-03-06 Thread Padarn Wilson
Thanks a lot for your suggestion. I’ll dig into it and update for the mailing list if I find anything useful. Padarn On Wed, 6 Mar 2019 at 6:03 PM, Piotr Nowojski wrote: > Re-adding user mailing list. > > > Hi, > > If it is a GC issue, only GC logs or some JVM memory profilers (like > Oracle’s

Re: Setting source vs sink vs window parallelism with data increase

2019-03-06 Thread Piotr Nowojski
Re-adding user mailing list. Hi, If it is a GC issue, only GC logs or some JVM memory profilers (like Oracle’s Mission Control) can lead you to the solution. Once you confirm that it’s a GC issue, there are numerous resources online how to analyse the cause of the problem. For that, it is

Re: Setting source vs sink vs window parallelism with data increase

2019-03-04 Thread Piotr Nowojski
Hi, What Flink version are you using? Generally speaking Flink might not the best if you have records fan out, this may significantly increase checkpointing time. However you might want to first identify what’s causing long GC times. If there are long GC pause, this should be the first thing

Re: Setting source vs sink vs window parallelism with data increase

2019-03-01 Thread Padarn Wilson
Hi all again - following up on this I think I've identified my problem as being something else, but would appreciate if anyone can offer advice. After running my stream from sometime, I see that my garbage collector for old generation starts to take a very long time: [image: Screen Shot

Setting source vs sink vs window parallelism with data increase

2019-02-28 Thread Padarn Wilson
Hi all, I'm trying to process many records, and I have an expensive operation I'm trying to optimize. Simplified it is something like: Data: (key1, count, time) Source -> Map(x -> (x, newKeyList(x.key1)) -> Flatmap(x -> x._2.map(y => (y, x._1.count, x._1.time)) ->