Hi Ling!

Nagios is going to check all hosts that inherit from the linux-server
template with whatever check is defined in that linux-server host template
(often it is "check_command check-host-alive" (and that check is normally a
ping). This will not work for compute nodes that are not reachable from the
Nagios server. For those nodes you should define a new host template with a
check command that routes through their service nodes somehow.

The parent-child relationship in Nagios prevents Nagios from checking and
alerting on child nodes when the parent is down (since it assumes they are
dependent on the parent). Often the parent designation is used for the
switch that is in between the Nagios server and the compute node (since if
the switch is not responding, obviously the node is unreachable, and does
not need to be checked).

Hope this helps!

--Jennifer
PS Long time no see!

On Tue, Nov 29, 2011 at 11:16 AM, Ling Gao <ling...@us.ibm.com> wrote:

> I know this may not be a correct group for Nagios questions.  But I know
> you guys there are excellent admins and hope some of you have experience on
> Nagios.
>
> I am setting up a xCAT hierarchical cluster that uses Nagios to to monitor
> the nodes and services on the nodes. For example,
> I have a mn that connects to two service nodes (sn1 and sn2), each service
> nodes connects to some compute nodes c11,c12.... on sn1, c21,c22..... on sn2
>
> I can define hosts sn1 and sn2 and services like check_users for the
> service nodes. It works fine. Of course, I have NRPE (a nagios plugin
> installed on sn1)
> define host{
>         use                 linux-server
>         host_name      sn1
>         alias               sn1.cluster.com
>         address         10.2.0.101
>         max_check_attempts  10
>         contact_groups admins
> }
>
> define service{
>     use                                    generic-service
>     contact_groups                  admins
>     hostgroups                        servicenode
>     service_description            Users
>     check_command                 check_nrpe!check_users
> }
>
> However, I do not know how to define hosts and services for the compute
> nodes because they do not have connection to the management node. I have
> search the web, found that you can define parent-child relationship between
> hosts and services.  So I defined the cn11 as the following on the mn.
> define host{
>     use             linux-server
>     host_name   cn11
>     alias            cn11.cluster.com
>     address       20.2.1.1
>     parents       sn1
>     max_check_attempts  10
>     contact_groups admins
> }
>
> But, the Nagios Web GUI shows that cn11 is "down" although it is up.
>
> My questions are:
> 1. how to define hosts for xCAT hierarchical cluster?
> 2. how to define services for xCAT hierarchical cluster?
> 3. what should be installed on the service nodes and compute nodes in
> order for this to works.
>
>
> Thanks,
>
> Ling
>
> Ling Gao
> Poughkeepsie Unix Development Lab
> IBM Systems and Technology Group
> Internal: T/L 293-5692
> External: ling...@us.ibm.com, 845-433-5692
>
> "I never worry about the future. It comes soon enough." --- Albert
> Einstein
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to