[ https://issues.apache.org/jira/browse/FLINK-25414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Piotr Nowojski closed FLINK-25414. ---------------------------------- Fix Version/s: 1.15.0 Resolution: Fixed Merged to master as bc22d2b90cb..1166d11f61a > Provide metrics to measure how long task has been blocked > --------------------------------------------------------- > > Key: FLINK-25414 > URL: https://issues.apache.org/jira/browse/FLINK-25414 > Project: Flink > Issue Type: New Feature > Components: Runtime / Metrics, Runtime / Task > Affects Versions: 1.14.2 > Reporter: Piotr Nowojski > Assignee: Piotr Nowojski > Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > Currently back pressured/busy metrics tell the user whether task is > blocked/busy and how much % of the time it is blocked/busy. But they do not > tell how for how long single block event is lasting. It can be 1ms or 1h and > back pressure/busy would be still reporting 100%. > In order to improve this, we could provide two new metrics: > # maxSoftBackPressureTime > # maxHardBackPressureTime > The max would be reset to 0 periodically or on every access to the metric > (via metric reporter). Soft back pressure would be if task is back pressured > in a non blocking fashion (StreamTask detected in availability of the > output). Hard back pressure would measure the time task is actually blocked. > In order to calculate those metrics I'm proposing to split the already > existing backPressuredTimeMsPerSecond into soft and hard versions as well. > Unfortunately I don't know how to efficiently provide similar metric for busy > time, without impacting max throughput. -- This message was sent by Atlassian Jira (v8.20.1#820001)