Hi Sergey,

As Andrey noted, it’s a known issue with (currently) no good solution.

I talk a bit about how we worked around it on slide 26 of my Flink Forward talk 
<https://www.slideshare.net/FlinkForward/flink-forward-san-francisco-2018-ken-krugler-building-a-scalable-focused-web-crawler-with-flink>
 on a Flink-based web crawler.

Basically we do some cheesy approximate monitoring of in-flight data, and 
throttle the key producer so that (hopefully) network buffers don’t fill up to 
the point of deadlock.

— Ken


> On Dec 24, 2018, at 8:46 AM, Andrey Zagrebin <and...@da-platform.com> wrote:
> 
> Hi Sergey,
> 
> It seems to be a known issue. Community will hopefully work on this but I do 
> not see more updates since the last answer to the similar question [1], see 
> also [2] and [3].
> 
> Best,
> Andrey
> 
> [1] 
> http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E
>  
> <http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E>
> [2] 
> http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E
>  
> <http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E>
> [3] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66853132 
> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66853132>
> On Mon, Dec 24, 2018 at 7:16 PM Sergei Poganshev <s.pogans...@slice.com 
> <mailto:s.pogans...@slice.com>> wrote:
> We've tried using iterations feature and in case of significant load the job 
> sometimes stalls and stops processing events due to high back pressure both 
> in tasks that produces records for iteration and all the other inputs to this 
> task. It looks like a back pressure loop the task can't handle all the 
> incoming records, iteration sink loops back into this task and also gets back 
> pressured. This is basically a "back pressure loop" which causes a complete 
> job stoppage.
> 
> Is there a way to mitigate this (to guarantee such issue does not occur)?

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra

Reply via email to