[ https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560486#comment-16560486 ]
Haibo Chen commented on YARN-6966: ---------------------------------- [~snemeth] We should not be doing patch for branch-3.0.0, as it is already released. Instead, we should create a patch for branch-3.0 which is the next 3.0.x maintenance release. > NodeManager metrics may return wrong negative values when NM restart > -------------------------------------------------------------------- > > Key: YARN-6966 > URL: https://issues.apache.org/jira/browse/YARN-6966 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Yang Wang > Assignee: Szilard Nemeth > Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-6966-branch-2.001.patch, > YARN-6966-branch-3.0.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, > YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, > YARN-6966.005.patch, YARN-6966.006.patch > > > Just as YARN-6212. However, I think it is not a duplicate of YARN-3933. > The primary cause of negative values is that metrics do not recover properly > when NM restart. > AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores > in metrics also need to recover when NM restart. > This should be done in ContainerManagerImpl#recoverContainer. > The scenario could be reproduction by the following steps: > # Make sure > YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true > in NM > # Submit an application and keep running > # Restart NM > # Stop the application > # Now you get the negative values > {code} > /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics > {code} > {code} > { > name: "Hadoop:service=NodeManager,name=NodeManagerMetrics", > modelerType: "NodeManagerMetrics", > tag.Context: "yarn", > tag.Hostname: "hadoop1111.com", > ContainersLaunched: 0, > ContainersCompleted: 0, > ContainersFailed: 2, > ContainersKilled: 0, > ContainersIniting: 0, > ContainersRunning: 0, > AllocatedGB: 0, > AllocatedContainers: -2, > AvailableGB: 160, > AllocatedVCores: -11, > AvailableVCores: 3611, > ContainerLaunchDurationNumOps: 2, > ContainerLaunchDurationAvgTime: 6, > BadLocalDirs: 0, > BadLogDirs: 0, > GoodLocalDirsDiskUtilizationPerc: 2, > GoodLogDirsDiskUtilizationPerc: 2 > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org