Thank you. On Tue, Jan 2, 2018 at 1:31 PM, Nico Kruber <n...@data-artisans.com> wrote:
> Hi Vishal, > let me already point you towards the JIRA issue for the credit-based > flow control: https://issues.apache.org/jira/browse/FLINK-7282 > > I'll have a look at the rest of this email thread tomorrow... > > > Regards, > Nico > > On 02/01/18 17:52, Vishal Santoshi wrote: > > Could you please point me to any documentation on the "credit-based > > flow control" approach.... > > > > On Tue, Jan 2, 2018 at 10:35 AM, Timo Walther <twal...@apache.org > > <mailto:twal...@apache.org>> wrote: > > > > Hi Vishal, > > > > your assumptions sound reasonable to me. The community is currently > > working on a more fine-grained back pressuring with credit-based > > flow control. It is on the roamap for 1.5 [1]/[2]. I will loop in > > Nico that might tell you more about the details. Until then I guess > > you have to implement a custom source/adapt an existing source to > > let the data flow in more realistic. > > > > Regards, > > Timo > > > > [1] > > http://flink.apache.org/news/2017/11/22/release-1.4-and-1. > 5-timeline.html > > <http://flink.apache.org/news/2017/11/22/release-1.4-and-1. > 5-timeline.html> > > [2] https://www.youtube.com/watch?v=scStdhz9FHc > > <https://www.youtube.com/watch?v=scStdhz9FHc> > > > > > > Am 1/2/18 um 4:02 PM schrieb Vishal Santoshi: > > > > I did a simulation on session windows ( in 2 modes ) and let it > > rip for about 12 hours > > > > 1. Replay where a kafka topic with retention of 7 days was the > > source ( earliest ) > > 2. Start the pipe with kafka source ( latest ) > > > > I saw results that differed dramatically. > > > > On replay the pipeline stalled after good ramp up while in the > > second case the pipeline hummed on without issues. For the same > > time period the data consumed is significantly more in the > > second case with the WM progression stalled in the first case > > with no hint of resolution ( the incoming data on source topic > > far outstrips the WM progression ) I think I know the reasons > > and this is my hypothesis. > > > > In replay mode the number of windows open do not have an upper > > bound. While buffer exhaustion ( and data in flight with > > watermark ) is the reason for throttle, it does not really > > limit the open windows and in fact creates windows that reflect > > futuristic data ( future is relative to the current WM ) . So if > > partition x has data for watermark time t(x) and partition y for > > watermark time t(y) and t(x) << t(y) where the overall watermark > > is t(x) nothing significantly throttles consumption from the y > > partition ( in fact for x too ) , the bounded buffer based > > approach does not give minute control AFAIK as one would hope > > and that implies there are far more open windows than the system > > can handle and that leads to the pathological case where the > > buffers fill up ( I believe that happens way late ) and > > throttling occurs but the WM does not proceed and windows that > > could ease the glut the throttling cannot proceed..... In the > > replay mode the amount of data implies that the Fetchers keep > > pulling data at the maximum consumption allowed by the open > > ended buffer approach. > > > > My question thus is, is there any way to have a finer control of > > back pressure, where in the consumption from a source is > > throttled preemptively ( by for example decreasing the buffers > > associated for a pipe or the size allocated ) or sleeps in the > > Fetcher code that can help aligning the performance to have real > > time consumption characteristics > > > > Regards, > > > > Vishal. > > > > > > > > > > > > > >