[ 
https://issues.apache.org/jira/browse/HIVE-25737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451147#comment-17451147
 ] 

Viktor Csomor commented on HIVE-25737:
--------------------------------------

PR: https://github.com/apache/hive/pull/2827

> Compaction Observability: Initiator/Worker/Cleaner cycle measurement 
> improvements
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-25737
>                 URL: https://issues.apache.org/jira/browse/HIVE-25737
>             Project: Hive
>          Issue Type: Improvement
>          Components: Standalone Metastore
>    Affects Versions: 4.0.0
>            Reporter: Viktor Csomor
>            Assignee: Viktor Csomor
>            Priority: Major
>
> In the Compaction Observability the Initiator/Worker/Cleaner cycle is 
> measured with a [Dropwizard 
> Timer|https://metrics.dropwizard.io/4.2.0/getting-started.html] metrics.
> {noformat}
> Timers
> A timer measures both the rate that a particular piece of code is called and 
> the distribution of its duration.
> {noformat}
> However this is not good to measure simply a duration. Furthermore, one HMS 
> can run multiple Worker threads and the duration of the last finished worker 
> is not really informative if a Worker thread got stuck.
> Timers do not carry enough information because they only bump the counter if 
> a Worker has finished a loop.
> If Initiator/Worker/Cleaner gets stuck, then the metrics is not provided 
> hence it didn't bump the counter.
> It'd better to implement the followings:
> - Time passed since Initiator start (single threaded) -> Gauge metric
> - Oldest Working compaction -> Gauge Metric
> - Oldest Working Cleaner -> Gauge metric



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to