[ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4712:
------------------------------------
    Attachment: YARN-4712-YARN-2928.v1.004.patch

thanks for the comments [~sjlee0].
Reverting the changes for the trunk code and limiting the scope to 2928
bq. My position is that we should skip reporting the value rather than 
reporting 0.
IIUC already existing patches are taking care of it, i am setting 
*cpuUsageTotalCoresPercentage* to -1 when *cpuUsagePercentPerCore* is -1, and 
in *NMTimelinePublisher* i am skipping if *cpuUsageTotalCoresPercentage* is  -1.

bq. Most of YARN's CPU accounting is based on cores rather than nodes/machines. 
IMO cpuUsagePercentPerCore would be a better value to emit. Thoughts?
IMO *cpuUsageTotalCoresPercentage* is important to gauge how much of the 
cluster's CPU is getting utlized, if its *cpuUsagePercentPerCore* i beleive it 
doesnt give the cluster's CPU on aggregation from all containers. Infact we 
need to report both and also IMO *cpuUsageTotalCoresPercentage* is not 
calculated properly it should be 
{code}
cpuUsageTotalCoresPercentage = (cpuUsagePercentPerCore 
/resourceCalculatorPlugin.getNumProcessors())/ (nodeCpuPercentageForYARN/100);
{code}
In this way we will be able to identify how much % of cluster's CPU is getting 
utilized.
Also do we broaden the scope of this jira further or shall we discuss on this 
in a different jira?

bq. Why are we appending the process id to the metric id? Doesn't this cause 
issues when we do the aggregation? 
Agree have handled this, i beleive YARN-3816 was also trying to address it, but 
as that might take little more time i am addressing as part of this jira.

> CPU Usage Metric is not captured properly in YARN-2928
> ------------------------------------------------------
>
>                 Key: YARN-4712
>                 URL: https://issues.apache.org/jira/browse/YARN-4712
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-4712-YARN-2928.v1.001.patch, 
> YARN-4712-YARN-2928.v1.002.patch, YARN-4712-YARN-2928.v1.003.patch, 
> YARN-4712-YARN-2928.v1.004.patch
>
>
> There are 2 issues with CPU usage collection 
> * I was able to observe that that many times CPU usage got from 
> {{pTree.getCpuUsagePercent()}} is 
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do 
> the calculation  i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore 
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE 
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not 
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but 
> ContainerMonitor is publishing decimal values for the CPU usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to