Yep, I'd recommend having your event handler that fires on an overheat
condition correlate _several_ sources before shutting down large numbers of
systems. If you look hard, you'll surely find a number of good sources for
temp correlation (netbotz, switch/router SNMP, management processors,
What messages, exactly, do you see in syslog? Are you comparing the
successful log line when a mail is TX/RX, are they the same? Is this the
expectation of continued alerting on a failure that lasts several hours, or
is this for separate events?
I'm guessing off hand that it's your
Title: RE: [Nagios-users] status.dat on /tmp causes status.cgi issues
Search the recent archives, as of the last three weeks this has been addressed and resolved by Ethan. Track down those posts for info, there's a pending-confirmation fix in CVS as well as a workaround (move your
I'd suggest NOT taking this action at the host level; reason being that all
service checks are halted for the duration of this non-parallelized
action... You want to avoid doing a host check until absolutely necessary.
Suggest increasing the max_check_attempts on the SERVICE to a larger number,
Title: RE: [Nagios-users] Memory Leak on Nagios 2.3 ?
At the very least, you'd need to get a look at how much Nagios itself is using, not just system-totals (check the manpage for interpreting results of free). Try 'top -n1 -p $nagiospid' or 'ps -F -p $nagiospid'. I've gone through several
Bringing up SNMP is a valid point (I'm currently handling ~25% of my active
service checks this way). However there are a number of scenarios where the
load on both the server/network/client is significantly greater to pull down
a tree that needs processing (process table for instance), as the
Title: RE: [Nagios-users] How service and host checks work together?
After turning off active host checks, assign a service check of whatever type and frequency you want to use in order to determine host up state. Next, assign a parent for the host configuration of devices _behind_
Title: RE: [Nagios-users] which dns check to use?
I don't recall what (if any) issues I had with check_dig, but check_dns doesn't handle responses from nslookup for matching an A record when checking a PTR (leaves the trailing . in the match string). While I was at it, I added DNS
I haven't had a chance yet to play with the free Monarch versions, so
this may be off-base. I run my Nagios (and Ganglia and Cacti, et al)
through mod_kerb under Apache. IMO, I prefer to use K5 for the ticket
passing/audit features, but you can go the mod_authzldap (or whatever)
with just
FAQ. No. See host vs. service checks in the docs. Look at what you're
using for a host check, and run that same thing as a regularly scheduled
service check... I'm guessing that's just a check_ping anyway.
/eli
Danila Kutepkin wrote:
Hi there!
Is there any sercet way to parallelize
I've historically used a mailinglist for this anyway... that way people
subscribe/unsubscribe themselves when they go on/off call, and you can
have as many people as appropriate on it. Not to mention it doesn't
require modifying the nagios configs and re-loading.
/eli
Philip Hallstrom
Interesting, I missed the post you referenced altogether... I never saw
that this had been handled in the past :)
Having this readily available and mainline would be great, IMO the
current behaviour severely limits the situations in which you can use
servicedependencies AND inherited
, the status.log is being continuously
updated as normal but when checks stop, the nagios.log stops gathering
entries as well.
On 3/17/06, Eli Stair [EMAIL PROTECTED] wrote:
I've been seeing this continuously in 2.0beta/rc/releases. For details
on my situation/posts check the devel/users archives, I'm curious
as you where only certain hosts/hostgroups
are being checked and then all of a sudden everything stops BUT pings
based on above but those checks are not being updated in nagios.log.
Very weird.
On 3/17/06, Eli Stair [EMAIL PROTECTED] wrote:
So you're seeing the scenario where nagios stops _all_
hosts in 1 hostgroup. Weird.
On 3/17/06, Eli Stair [EMAIL PROTECTED] wrote:
Are you in a position to stop services for a minute and check starting
up again with the retention.dat file moved out of the way? If you're
hesitant you may want to start up another instance of Nagios in parallel
To be fair, one situation where these wildcard template dependencies
don't work is when you want to define a number of dependent services
that rely on a service on the same host (i.e. not a number of separate
services on different hosts that rely on a single (or wildcardable)
host/service).
The way check_log works is to compare the current log vs. the log at the
time the last check was run so that behaviour is normal... I don't
recall if I changed this or not, but I have the output set to summarize
the number of instances matches, and alert based on that value, so the
check
Check google and/or nagiosplug.sourceforge.net, you'll find at least
plugins to let you check the SSH banner (if that's all you want to do)
and run commands/send output back to the plugin (check_ssh |
check_by_ssh). That's assuming you meant ssh and not telnet, if you
need telnet I've
ACK
I think those all came through OK.
Check the docs for {net|ucd}-snmp, and the help for check_snmp.* For
instance, find out what version of what plugin you're using, what
mechanism it uses to read the MIB and access SNMP (bash/perl exec of the
binaries, perl module, direct binary
OT:
Nope, other than that yast is scary when it breaks things ;)
That should actually be quite easy, I've been using vmplayer
successfully for windows-on-linux compute images, etc. That'd also be
one take on deploying nodes for a distributed/remote nagios setup.
/eli
jon.johnston wrote:
I've got several different ext host/svc info links set up to various
other pages. That's great, and I'm using $HOSTNOTESURL$ in alert
emails. $HOSTNOTESURL$ seems to be stripping out the ampersand
during expansion:
extended host info defined in config as:
notes_url
I've attached an email I saved a while back in relation to this. I see
the same issue when compiling 64-bit on any x86* platform distro.
Haven't had a chance to patch and test (or even read it...) this yet,
let us know if it resolves the issue.
/eli
Tom Brown wrote:
Hi
Just installed
For
mapping == graphical
Whatever the equivalent of Visio is on Linux (GPL of course).
mapping == device discovery
Netdisco
netdi
opennms
cacti plugin (mac?)
intermapper
The first two items are OSS network management tools, for maintaining
and controlling device configs. They have
Title: RE: [Nagios-users] Check_snmp_int Plugin
You should just be able to add a few lines to the existing script to poll that. Since that OID exists as a leaf of the same IFIndex table containing all the metrics (or at worst another table whose interfaces numbering always matches the
The easiest thing would likely be to make a quick logrotate definition
for the log file. The logrotating tool will depend on your
distribution, but they're all quite simple. Your existing system logs
are (likely) already being taken care of this way, just find their
configs and use as a
How about a quickie bash/perl wrapper around netcat?
/eli
Angel L. Mateo wrote:
Hello,
I am trying to monitor one of my servers with check_tcp plugin. With
this, I could send a string to the server (to the specified port) and
analize the answer in order to get an OK, warning or
Why not just extend the command that sends your alerts a bit? Modifying
your notify-by-email (or equivalent) to fire off your own external
script that does what the default behaviour is, adding to it a function
to do your custom work if the $HOSTSTATE$ == CRITICAL.
/eli
Don Lewis wrote:
I've been seeing this (without problem) throughout the 2.0b series on
x86_64.
As of building 2.0rc2 though, I'm getting errors with the same config
that were not present earlier:
Premature end of script headers: cmd.cgi
Haven't tracked down the cause, it's only occuring now when
Title: RE: [Nagios-users] Nagios check a jsp file
I've just been looking into this for a similar need, only my app requres frames also. You may be able to get by easier, there are a number of Perl, Python, and Ruby plugins available that make emulating a JS-client/browser in a script
Just a suggestion, you might want to try polling that info with Cacti
and then get the service states from the cacti rrd's in Nagios. I
assume that info is manifested in the F5 Enterprise MIB, and probably
well-documented. Search around and you'll find the check_rrd_* scripts
for pulling
Running 2.0b6 and I find that these two macros don't evaluate as I
believe they should.
Going on the 2.0 docs, $HOSTALIAS$ seems just as valid as $HOSTNAME$ in
the context of using it for an extended host information macro. In this
case however, $HOSTNAME$ works while $HOSTALIAS$ goes
Title: RE: [Nagios-users] Advice on Plugin Development
I have absolutely no idea about the actual setup this user has, but my assumption is that they're interfacing with some OOB tool (not the IP of the box itself) i.e. power strip, management processor, STONITH interrupt etc...
And as far
results
may get lost or mangled! I get when building 2.0betas on any system I
have available, I haven't seen this addressed/resolved in any searches
of archives I've done.
Cheers,
/eli
Eli Stair wrote:
Corroboration here, I actually have a mail I'm compiling also on the
same issue. 2.0b6
Patrick Rutkowski wrote:
You say that you set nagios to not use retention files at all, and it
worked. I did the same, but it still doesn't work :-(
On 12/15/05, Eli Stair [EMAIL PROTECTED] wrote:
You're one of a bunch of people, myself included, who I think are
running into the same
I've had success with 'echo -en \007' (aka: beep). Have an
eventhandler run that whenever you want to get annoyed.
It's fun to use with a remote ssh type handler when you're watching
interns try to find a node in a cluster and you're watching on Netbotz ;)
/eli
Marcio Merlone wrote:
On
at 11:58 -0800, Eli Stair wrote:
I've had success with 'echo -en \007' (aka: beep). Have an
eventhandler run that whenever you want to get annoyed.
It's fun to use with a remote ssh type handler when you're watching
interns try to find a node in a cluster and you're watching on
Netbotz
.
In message [EMAIL PROTECTED],
Eli Stair writes:
The question comes down to this:
Should a failed service check for a dependent trigger a check of its
parent before continuing?
IIRC from the code it does not force a check of the parent service. I
can see arguments for and against forcing a poll
I'm switching over to using auth_kerb from AD (just because I've done
that, not the authz_ldap).
I'm curious if you both have the username being populated properly in
the Author portion of the CGI's when you're adding comments/ ack'ing
events.
I still haven't gotten any suggestions from
.
-FredC
*/Eli Stair [EMAIL PROTECTED]/* wrote:
I'm running a fresh build of 2.0b5 on x86_64. After an initial start of
nagios, it can take up to 10 minutes for the first host or service
checks to begin. There is no CPU load by the nagios process during this
time. I have over 1000
Good point. I don't know how to implement
host-check-only-on-servicecheck-fail on 2.0. I don't see in the config
that it is supported, from what I read execute_host_checks is a global
do or donot hostcheck declarative.
Is this correct?
/eli
Ludwig Pummer wrote:
I should point out I'm
40 matches
Mail list logo