[ 
https://issues.apache.org/jira/browse/FLINK-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski reassigned FLINK-14712:
--------------------------------------

    Assignee: Piotr Nowojski  (was: lining)

> Improve back-pressure reporting mechanism
> -----------------------------------------
>
>                 Key: FLINK-14712
>                 URL: https://issues.apache.org/jira/browse/FLINK-14712
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Metrics, Runtime / Network, Runtime / REST
>            Reporter: lining
>            Assignee: Piotr Nowojski
>            Priority: Major
>         Attachments: image-2019-11-12-14-30-16-130.png
>
>
> h4. (1) The current monitor is heavy-weight. 
>  *   Backpressure monitoring works by repeatedly taking stack trace samples 
> of your running tasks.
> h4. (2) It is difficult to find out which vertex is the source  of  
> backpressure.
>  * User need to know current and upstream's network metric to judge current 
> whether is the source of backpressure. Now user has to record relevant 
> information.
> h3. Proposed Changes
> 1. expose the new mechanism implemented in FLINK-14472 as a "is 
> back-pressured" metric.
> 2. show the vertex that produces the backpressure source for the job.
> 3. expose network metric in IOMetricsInfo:
>  * SubTask
>  **  pool usage: outPoolUsage, inputExclusiveBuffersUsage, 
> inputFloatingBuffersUsage.
>  *** If the subtask is not back pressured, but it is causing backpressure 
> (full input, empty output)
>  *** By comparing exclusive/floating buffers usage, whether all channels are 
> back-pressure or only some of them
>  ** back-pressured for show whether it is back pressured.
>  * Vertex
>  ** pool usage: outPoolUsageAvg, inputExclusiveBuffersUsageAvg, 
> inputFloatingBuffersUsageAvg
>  ** back-pressured for show whether it is back pressured(merge all iths 
> subtasks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to