[ https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562373#comment-16562373 ]
genericqa commented on YARN-6966: --------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 20m 9s{color} | {color:red} Docker failed to build yetus/hadoop:f667ef1. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-6966 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933626/YARN-6966-branch-2.002.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21440/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > NodeManager metrics may return wrong negative values when NM restart > -------------------------------------------------------------------- > > Key: YARN-6966 > URL: https://issues.apache.org/jira/browse/YARN-6966 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Yang Wang > Assignee: Szilard Nemeth > Priority: Major > Fix For: 3.2.0, 3.0.4, 3.1.2 > > Attachments: YARN-6966-branch-2.001.patch, > YARN-6966-branch-2.002.patch, YARN-6966-branch-2.002.patch, > YARN-6966-branch-2.002.patch, YARN-6966-branch-3.0.0.001.patch, > YARN-6966-branch-3.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, > YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, > YARN-6966.005.patch, YARN-6966.006.patch > > > Just as YARN-6212. However, I think it is not a duplicate of YARN-3933. > The primary cause of negative values is that metrics do not recover properly > when NM restart. > AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores > in metrics also need to recover when NM restart. > This should be done in ContainerManagerImpl#recoverContainer. > The scenario could be reproduction by the following steps: > # Make sure > YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true > in NM > # Submit an application and keep running > # Restart NM > # Stop the application > # Now you get the negative values > {code} > /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics > {code} > {code} > { > name: "Hadoop:service=NodeManager,name=NodeManagerMetrics", > modelerType: "NodeManagerMetrics", > tag.Context: "yarn", > tag.Hostname: "hadoop1111.com", > ContainersLaunched: 0, > ContainersCompleted: 0, > ContainersFailed: 2, > ContainersKilled: 0, > ContainersIniting: 0, > ContainersRunning: 0, > AllocatedGB: 0, > AllocatedContainers: -2, > AvailableGB: 160, > AllocatedVCores: -11, > AvailableVCores: 3611, > ContainerLaunchDurationNumOps: 2, > ContainerLaunchDurationAvgTime: 6, > BadLocalDirs: 0, > BadLogDirs: 0, > GoodLocalDirsDiskUtilizationPerc: 2, > GoodLogDirsDiskUtilizationPerc: 2 > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org