Could you please point me to any documentation on the  "credit-based flow
control" approach....

On Tue, Jan 2, 2018 at 10:35 AM, Timo Walther <twal...@apache.org> wrote:

> Hi Vishal,
>
> your assumptions sound reasonable to me. The community is currently
> working on a more fine-grained back pressuring with credit-based flow
> control. It is on the roamap for 1.5 [1]/[2]. I will loop in Nico that
> might tell you more about the details. Until then I guess you have to
> implement a custom source/adapt an existing source to let the data flow in
> more realistic.
>
> Regards,
> Timo
>
> [1] http://flink.apache.org/news/2017/11/22/release-1.4-and-1.5-
> timeline.html
> [2] https://www.youtube.com/watch?v=scStdhz9FHc
>
>
> Am 1/2/18 um 4:02 PM schrieb Vishal Santoshi:
>
> I did a simulation on session windows ( in 2 modes ) and let it rip for
>> about 12 hours
>>
>> 1. Replay where a kafka topic with retention of 7 days was the source (
>> earliest )
>> 2. Start the pipe with kafka source ( latest )
>>
>> I saw results that differed dramatically.
>>
>> On replay the pipeline stalled after  good ramp up while in the second
>> case the pipeline hummed on without issues. For the same time period the
>> data consumed is significantly more in the second case with the WM
>> progression stalled in the first case with no hint of resolution ( the
>> incoming data on source topic far outstrips the WM progression )  I think I
>> know the reasons and this is my hypothesis.
>>
>> In replay mode the number of windows open do not have an upper bound.
>> While buffer exhaustion ( and data in flight with watermark )  is the
>> reason for throttle, it does not really limit the open windows and in fact
>> creates windows that reflect futuristic data ( future is relative to the
>> current WM ) . So if partition x has data for watermark time t(x) and
>> partition y for watermark time t(y) and t(x) << t(y) where the overall
>> watermark is t(x) nothing significantly throttles consumption from the y
>> partition ( in fact for x too ) , the bounded buffer based approach does
>> not give minute control AFAIK as one would hope and that implies there are
>> far more open windows than the system can handle and that leads to the
>> pathological case where the buffers fill up  ( I believe that happens way
>> late ) and throttling occurs but the WM does not proceed and windows that
>> could ease the glut the throttling cannot proceed..... In the replay mode
>> the amount of data implies that the Fetchers keep pulling data at the
>> maximum consumption allowed by the open ended buffer approach.
>>
>> My question thus is, is there any way to have a finer control of back
>> pressure, where in the consumption from a source is throttled preemptively
>> ( by for example decreasing the buffers associated for a pipe or the size
>> allocated ) or sleeps in the Fetcher code that can help aligning the
>> performance to have real time consumption  characteristics
>>
>> Regards,
>>
>> Vishal.
>>
>>
>>
>>
>>
>

Reply via email to