Community,

I am interested in knowing what is the recommended way of capacity planning a particular Flink application with current resource allocation. Taking a look at the Flink documentation (https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#capacity-planning), extra resources need to be allocated on top of what has already been assigned for normal operations for when failures occur. The amount of extra resources will determine how quickly the application can catch-up to the head of the input stream, e.g. kafka, considering event time processing.

So, as far as i know the recommended way of testing the maximum capacity of the system is to slowly increase the ingestion rate to find the point just before backpressure would kick in.

Would a strategy of starting the job at an earlier timestamp far enough in the past so that the system is forced to catch-up for a few minutes, and then take an average measurement of the ingress rate over this time be a sufficient strategy for determining the maximum number of messages that can be processed?

Thank you in advance! Have a great day!

Regards,
M.

Reply via email to