[ 
https://issues.apache.org/jira/browse/FLINK-19009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Liu updated FLINK-19009:
------------------------------
    Comment: was deleted

(was: I think I can fix this bug, but first we need to reach a consensus on the 
definition of 'a failing/recovering situation' in Flink Docs. There are 10 
types of JobStatus. And 'FAILING' is the case. But what about the others? For 
example, 'RECONCILING'. ([~jark] What do you think, or do you know someone 
familiar with this part?))

> wrong way to calculate the "downtime" metric
> --------------------------------------------
>
>                 Key: FLINK-19009
>                 URL: https://issues.apache.org/jira/browse/FLINK-19009
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination, Runtime / Metrics
>    Affects Versions: 1.7.2, 1.8.0
>            Reporter: Zhinan Cheng
>            Priority: Trivial
>             Fix For: 1.12.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Currently the way to calculate the Flink system metric "downtime"  is not 
> consistent with the description in the doc, now the downtime is actually the 
> current timestamp minus the time timestamp when the job started.
>    
> But Flink doc (https://flink.apache.org/gettinghelp.html) obviously describes 
> the time as the current timestamp minus the timestamp when the job failed.
>  
> I believe we should update the code this metric as the Flink doc shows. The 
> easy way to solve this is using the current timestamp to minus the latest 
> uptime timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to