Thank you so much, I will try that! ________________________________ 发件人: Harsh J <ha...@cloudera.com> 发送时间: 2018年10月16日 17:27:07 收件人: ims...@outlook.com 抄送: <user@hadoop.apache.org> 主题: Re: How can I find out which nodemanagers are unhealthy and which nodemangers are lost?
I don't think it includes entirely inactive nodes. Use the CLI or use the RM REST API directly: https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Nodes_API On Mon, Oct 15, 2018 at 12:20 PM Huang Meilong <ims...@outlook.com> wrote: > > Thank you Harsh, > > > What are the possible values for the state in LiveNodeManagers bean? Will > LOST, ACTIV, REBOOTED and DECOMMISSIONED show up in the state filed? > > ________________________________ > 发件人: Harsh J <ha...@cloudera.com> > 发送时间: 2018年10月15日 12:46:49 > 收件人: ims...@outlook.com > 抄送: <user@hadoop.apache.org> > 主题: Re: How can I find out which nodemanagers are unhealthy and which > nodemangers are lost? > > The JMX servlet query for 'RMNMInfo' done via > /jmx?qry=Hadoop:service=ResourceManager,name=RMNMInfo returns a > LiveNodeManagers bean whose value is a JSON-parseable string of all > currently-tracked NodeManagers and their actual states (UNHEALTHY, > RUNNING, etc.). > > You can also use the 'yarn node -list' command to retrieve similar > information from a CLI. > On Mon, Oct 15, 2018 at 8:48 AM Huang Meilong <ims...@outlook.com> wrote: > > > > Hi, > > > > > > I'm building a system to monitor my hadoop cluster, I can get metrics about > > the cluster via hadoop > > metrics(https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Metrics.html?spm=5176.2020520111.111.1.278ad103oLtdlm#NodeManagerMetrics): > > > > > > ClusterMetrics > > > > ClusterMetrics shows the metrics of the YARN cluster from the > > ResourceManager’s perspective. Each metrics record contains Hostname tag as > > additional information along with metrics. > > > > Name Description > > NumActiveNMs Current number of active NodeManagers > > NumDecommissionedNMs Current number of decommissioned NodeManagers > > NumLostNMs Current number of lost NodeManagers for not sending heartbeats > > NumUnhealthyNMs Current number of unhealthy NodeManagers > > NumRebootedNMs Current number of rebooted NodeManagers > > > > > > How can I find out which nodemangers are unhealthy and which are lost? > > Better if it could be achieved by calling jmx rest api or hadoop command. > > > > > > Any suggestions are appreciated, thank you. > > > > > > > > HUANG > > > > > > > > > > > -- > Harsh J -- Harsh J