Re: Backpressure tuning/failure

2019-10-10 Thread Owen Rees-Hayward
Hey Piotr, I think we are broadly in agreement, hopefully. So out of the three scenarios you describe, the code is simulating scenario 2). The only additional comment I would make to this is that the additional load on a node could be an independent service or job. I am guessing we can agree,

Re: Backpressure tuning/failure

2019-10-10 Thread Piotr Nowojski
Hi Owen, Thanks for the quick response. No, I haven’t seen the previous blog post, yes it clears the things out a bit. > To clarify, the code is attempting to simulate a straggler node due to high > load, which therefore processes data at a slower rate - not a failing node. > Some degree of

Re: Backpressure tuning/failure

2019-10-10 Thread Owen Rees-Hayward
Hi Piotr, Thanks for getting back to me and for the info. I try to describe the motivation around the scenarios in the original post in the series - see the 'Backpressure - why you might care' section on http://owenrh.me.uk/blog/2019/09/30/. Maybe it could have been clearer. As you note, this

Re: Backpressure tuning/failure

2019-10-10 Thread Piotr Nowojski
Hi, I’m not entirely sure what you are testing. I have looked at your code (only the constant straggler scenario) and please correct me if’m wrong, in your job you are basically measuring throughput of `Thread.sleep(straggler.waitMillis)`. In the first RichMap task (`subTaskId == 0`), per

Backpressure tuning/failure

2019-10-08 Thread Owen Rees-Hayward
Hi, I am having a few issues with the Flink (v1.8.1) backpressure default settings, which lead to poor throughput in a comparison I am doing between Storm, Spark and Flink. I have a setup that simulates a progressively worse straggling task that Storm and Spark cope with the relatively well.