Thank you.

On Tue, Jan 2, 2018 at 1:31 PM, Nico Kruber <n...@data-artisans.com> wrote:

> Hi Vishal,
> let me already point you towards the JIRA issue for the credit-based
> flow control: https://issues.apache.org/jira/browse/FLINK-7282
>
> I'll have a look at the rest of this email thread tomorrow...
>
>
> Regards,
> Nico
>
> On 02/01/18 17:52, Vishal Santoshi wrote:
> > Could you please point me to any documentation on the  "credit-based
> > flow control" approach....
> >
> > On Tue, Jan 2, 2018 at 10:35 AM, Timo Walther <twal...@apache.org
> > <mailto:twal...@apache.org>> wrote:
> >
> >     Hi Vishal,
> >
> >     your assumptions sound reasonable to me. The community is currently
> >     working on a more fine-grained back pressuring with credit-based
> >     flow control. It is on the roamap for 1.5 [1]/[2]. I will loop in
> >     Nico that might tell you more about the details. Until then I guess
> >     you have to implement a custom source/adapt an existing source to
> >     let the data flow in more realistic.
> >
> >     Regards,
> >     Timo
> >
> >     [1]
> >     http://flink.apache.org/news/2017/11/22/release-1.4-and-1.
> 5-timeline.html
> >     <http://flink.apache.org/news/2017/11/22/release-1.4-and-1.
> 5-timeline.html>
> >     [2] https://www.youtube.com/watch?v=scStdhz9FHc
> >     <https://www.youtube.com/watch?v=scStdhz9FHc>
> >
> >
> >     Am 1/2/18 um 4:02 PM schrieb Vishal Santoshi:
> >
> >         I did a simulation on session windows ( in 2 modes ) and let it
> >         rip for about 12 hours
> >
> >         1. Replay where a kafka topic with retention of 7 days was the
> >         source ( earliest )
> >         2. Start the pipe with kafka source ( latest )
> >
> >         I saw results that differed dramatically.
> >
> >         On replay the pipeline stalled after  good ramp up while in the
> >         second case the pipeline hummed on without issues. For the same
> >         time period the data consumed is significantly more in the
> >         second case with the WM progression stalled in the first case
> >         with no hint of resolution ( the incoming data on source topic
> >         far outstrips the WM progression )  I think I know the reasons
> >         and this is my hypothesis.
> >
> >         In replay mode the number of windows open do not have an upper
> >         bound. While buffer exhaustion ( and data in flight with
> >         watermark )  is the reason for throttle, it does not really
> >         limit the open windows and in fact creates windows that reflect
> >         futuristic data ( future is relative to the current WM ) . So if
> >         partition x has data for watermark time t(x) and partition y for
> >         watermark time t(y) and t(x) << t(y) where the overall watermark
> >         is t(x) nothing significantly throttles consumption from the y
> >         partition ( in fact for x too ) , the bounded buffer based
> >         approach does not give minute control AFAIK as one would hope
> >         and that implies there are far more open windows than the system
> >         can handle and that leads to the pathological case where the
> >         buffers fill up  ( I believe that happens way late ) and
> >         throttling occurs but the WM does not proceed and windows that
> >         could ease the glut the throttling cannot proceed..... In the
> >         replay mode the amount of data implies that the Fetchers keep
> >         pulling data at the maximum consumption allowed by the open
> >         ended buffer approach.
> >
> >         My question thus is, is there any way to have a finer control of
> >         back pressure, where in the consumption from a source is
> >         throttled preemptively ( by for example decreasing the buffers
> >         associated for a pipe or the size allocated ) or sleeps in the
> >         Fetcher code that can help aligning the performance to have real
> >         time consumption  characteristics
> >
> >         Regards,
> >
> >         Vishal.
> >
> >
> >
> >
> >
> >
>
>

Reply via email to