Thank you so much, I will try that!

________________________________
发件人: Harsh J <ha...@cloudera.com>
发送时间: 2018年10月16日 17:27:07
收件人: ims...@outlook.com
抄送: <user@hadoop.apache.org>
主题: Re: How can I find out which nodemanagers are unhealthy and which 
nodemangers are lost?

I don't think it includes entirely inactive nodes. Use the CLI or use
the RM REST API directly:
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Nodes_API
On Mon, Oct 15, 2018 at 12:20 PM Huang Meilong <ims...@outlook.com> wrote:
>
> Thank you Harsh,
>
>
> What are the possible values for the state in LiveNodeManagers bean? Will 
> LOST, ACTIV, REBOOTED and DECOMMISSIONED show up in the state filed?
>
> ________________________________
> 发件人: Harsh J <ha...@cloudera.com>
> 发送时间: 2018年10月15日 12:46:49
> 收件人: ims...@outlook.com
> 抄送: <user@hadoop.apache.org>
> 主题: Re: How can I find out which nodemanagers are unhealthy and which 
> nodemangers are lost?
>
> The JMX servlet query for 'RMNMInfo' done via
> /jmx?qry=Hadoop:service=ResourceManager,name=RMNMInfo returns a
> LiveNodeManagers bean whose value is a JSON-parseable string of all
> currently-tracked NodeManagers and their actual states (UNHEALTHY,
> RUNNING, etc.).
>
> You can also use the 'yarn node -list' command to retrieve similar
> information from a CLI.
> On Mon, Oct 15, 2018 at 8:48 AM Huang Meilong <ims...@outlook.com> wrote:
> >
> > Hi,
> >
> >
> > I'm building a system to monitor my hadoop cluster, I can get metrics about 
> > the cluster via hadoop 
> > metrics(https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Metrics.html?spm=5176.2020520111.111.1.278ad103oLtdlm#NodeManagerMetrics):
> >
> >
> > ClusterMetrics
> >
> > ClusterMetrics shows the metrics of the YARN cluster from the 
> > ResourceManager’s perspective. Each metrics record contains Hostname tag as 
> > additional information along with metrics.
> >
> > Name Description
> > NumActiveNMs Current number of active NodeManagers
> > NumDecommissionedNMs Current number of decommissioned NodeManagers
> > NumLostNMs Current number of lost NodeManagers for not sending heartbeats
> > NumUnhealthyNMs Current number of unhealthy NodeManagers
> > NumRebootedNMs Current number of rebooted NodeManagers
> >
> >
> > How can I find out which nodemangers are unhealthy and which are lost? 
> > Better if  it could be achieved by calling jmx rest api or hadoop command.
> >
> >
> > Any suggestions are appreciated, thank you.
> >
> >
> >
> > HUANG
> >
> >
> >
> >
>
>
> --
> Harsh J



--
Harsh J

Reply via email to