Re: How to determine average utilization before backpressure kicks in?

Khachatryan Roman Tue, 25 Feb 2020 07:19:33 -0800

Hi Morgan,

Thanks for your reply.


I think the only possible way to determine this limit is load testing. In
the end, this is all load testing is about.
I can only suggest testing parts of the system separately to know their
individual limits (e.g. IO, CPU). Ideally, this should be done on a regular
basis.

Hope this helps.

Regards,
Roman


On Tue, Feb 25, 2020 at 2:47 PM Morgan Geldenhuys <
morgan.geldenh...@tu-berlin.de> wrote:

> Hi Roman,
>
> Thank you for the reply.
>
> Yes, I am aware that backpressure can be the result of many factors and
> yes this is an oversimplification of something very complex, please bare
> with me. Lets assume that this has been taken into account and has lowered
> the threshold for when this status permanently comes into effect, i.e. HIGH.
>
> Example: The system is running along perfectly fine under normal
> conditions, accessing external sources, and processing at an average of
> 100,000 messages/sec. Lets assume the maximum capacity is around 130,000
> message/sec before back pressure starts propagating messages back up the
> stream. Therefore, utilization is at 0.76 (100K/130K). Great, but at
> present we dont know that 130,000 is the limit.
>
> For this example or for any job, is there a way of finding this maximum
> capacity (and hence the utilization) without pushing the system to its
> limit based on the current throughput? Possibly by measuring (as you say)
> the saturation of certain buffers (looking into this now, however, i am not
> too familiar with flink internals)? It doesn't have to be extremely
> precise. Any hints would be greatly appreciated.
>
> Regards,
> M.
>
> On 25.02.20 13:34, Khachatryan Roman wrote:
>
> Hi Morgan,
>
> Regarding backpressure, it can be caused by a number of factors, e.g.
> writing to an external system or slow input partitions.
>
> However, if you know that a particular resource is a bottleneck then it
> makes sense to monitor its saturation.
> It can be done by using Flink metrics. Please see the documentation for
> more details:
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html
>
> Regards,
> Roman
>
>
> On Tue, Feb 25, 2020 at 12:33 PM Morgan Geldenhuys <
> morgan.geldenh...@tu-berlin.de> wrote:
>
>> Hello community,
>>
>> I am fairly new to Flink and have a question concerning utilization. I
>> was hoping someone could help.
>>
>> Knowing that backpressure is essentially the point at which utilization
>> has reached 100% for any particular streaming pipeline and means that
>> the application cannot "keep up" with the messages coming into the system.
>>
>> I was wondering, assuming a fairly stable input throughput, is there a
>> way of determining the average utilization as a percentage? Can we
>> determine how much more capacity each operator has before backpressure
>> kicks in from metrics alone, i.e. 60% of capacity for example? Knowing
>> that the maximum throughput of the DSP application is dictated by the
>> slowest part of the pipeline, we would need to identify the slowest
>> operator and then average horizontally.
>>
>> The only method that I can see of determining the point at which the
>> system cannot keep up any longer is by scaling the input throughput
>> slowly until the backpressure HIGH alarm is shown and hence the number
>> of messages/sec is known.
>>
>> Yes I know this is a gross oversimplification and there are many many
>> factors that need to be taken into account when dealing with
>> backpressure, but it would be nice to have a general indicator, a rough
>> estimate is fine.
>>
>> Thank you in advance.
>>
>> Regards,
>> M.
>>
>>
>>
>>
>

Re: How to determine average utilization before backpressure kicks in?

Reply via email to