Re: [Simple-evcorr-users] Trying to report extended NFS problems along with an OK.

Risto Vaarandi Tue, 18 Feb 2014 15:26:35 -0800

...and also, if you don't want to generate repeated warnings after each 60
seconds when the remote file system stays down, you could add the 'context'
field to the first rule as well:
context=!ALERT_SENT_NODE_$1_FS_$2


There are also other ways for addressing this kind of task, and one
synthetic-event-based approach for cisco linkDown and linkUp events has
been outlined in: http://simple-evcorr.sourceforge.net/man.html#lbBD (see
the last 3 rules of this larger example).

kind regards,
risto


2014-02-19 1:15 GMT+02:00 Risto Vaarandi <[email protected]>:

> hi Douglas,
> you could set up a context after an alarm is sent, and issue
> AllClear-message for "hostA kernel: nfs server hostb:/filesystem: is alive
> again" only if the context exists. Here is an example:
>
> type=PairWithWindow
> ptype=RegExp
> pattern=(\S+) kernel: nfs server (\S+): not responding
> desc=$1: remote fs $2 not responding
> action=write - %s; create ALERT_SENT_NODE_$1_FS_$2
> ptype2=SubStr
> pattern2=$1 kernel: nfs server $2: is alive again
> continue2=TakeNext
> desc2=$1: remote fs $2 alive again
> action2=none
> window=60
>
> type=Single
> ptype=RegExp
> pattern=(\S+) kernel: nfs server (\S+): is alive again
> context=ALERT_SENT_NODE_$1_FS_$2
> desc=$1: remote fs $2 alive again
> action=write - %s; delete ALERT_SENT_NODE_$1_FS_$2
>
> if an error condition is detected at some host for some remote file system
> which does not get cleared within 60 seconds, a warning string is written
> to standard output by the operation started by the first rule. The second
> rule is a Single rule which sends an AllClear message for a host and a file
> system if we have issued a warning for this particular host-filesystem
> combination previously.
> Also, if your actual messages don't start with the host name but have a
> preceding timestamp, it is wise to write pattern2 field of the first rule as
> pattern2=\s$1 kernel: nfs server $2: is alive again
> in order to avoid accidental match by a longer hostname which has the same
> ending but extra leading characters (e.g., AhostA or BBhostA)
>
> hope this helps,
> risto
>
>
>
> 2014-02-19 0:31 GMT+02:00 Douglas K. Rand <[email protected]>:
>
> With the usual BSD syslog messages related to NFS problems:
>>
>> hostA kernel: nfs server hostb:/filesystem: not responding
>> ...
>> hostA kernel: nfs server hostb:/filesystem: is alive again
>>
>> What I'm trying to do is generate an alert email if we see a "not
>> responding" message with out a corresponding "is alive again" within 60
>> seconds. But if we sent out the alert email I also want to send an
>> all-clear email when we do eventually get the "alive again" message,
>> perhaps even hours later.
>>
>> I've figured out how to do one or the other: I can generate the alert
>> email if a "not responding" message is not followed by a "alive again"
>> message with in 60 seconds.
>>
>> And I can generate the all-clear message for each and every "alive
>> again" message.
>>
>> But putting them together is stumping me. I only want to send the
>> all-clear message if we already have sent out the alert email; but if we
>> don't send out the alert there is no reason to send out the all-clear.
>>
>> I was thinking if when the "alive again" message came in if I could do
>> something like:
>>
>> if ($age > 60) report context mail -s "NFS all-clear" rand;
>> delete context
>>
>> I'm not even sure that is the right approach, seems to not fit into SEC,
>> even if I could figure out how to do it.
>>
>> Advice anybody?
>>
>>
>> ------------------------------------------------------------------------------
>> Managing the Performance of Cloud-Based Applications
>> Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
>> Read the Whitepaper.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Simple-evcorr-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users
>>
>
>

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk

_______________________________________________
Simple-evcorr-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Re: [Simple-evcorr-users] Trying to report extended NFS problems along with an OK.

Reply via email to