Re: [Simple-evcorr-users] keepalive

Risto Vaarandi Sat, 07 Apr 2018 01:58:28 -0700

hi Eli,

it is indeed not very efficient to recreate the KEEPALIVE context on every
syslog message which comes in. Also, this approach has another drawback --
suppose sec needs to be shut down for a short maintenance, and one of the
hosts stops sending messages while sec is down. When maintenance ends and
sec is restarted, it will never see messages from malfunctioning host, and
therefore no alert can be triggered for this host.


Let me suggest an alternative and more efficient way for attacking this
problem which is outlined in below ruleset. This ruleset assumes that a
keepalive check must be explicitly enabled for a host (in the example
below, checks have been enabled for two hosts "host1" and "host2"). In
order to trigger a keepalive check for a host, the Calendar rule creates a
synthetic event KEEPALIVE_hostname once a minute (you can increase the
check interval easily if you would prefer a larger window like 10 or 20
minutes). The KEEPALIVE_hostname event is matched by the PairWithWindow
rule which starts a waiting operation for the given hostname. If the
waiting operation does not see any messages for the given host during 5
minutes, alarm is triggered. Note that in the PairWithWindow rule the
context SUPPRESS_hostname has been used, in order to suppress repeated
alerts being generated malfunctioning hosts during 3 hours (10800 seconds).

type=Calendar
time=* * * * *
desc=trigger keepalive check
action=event KEEPALIVE_host1; event KEEPALIVE_host2

type=PairWithWindow
ptype=regexp
pattern=KEEPALIVE_(\S+)
context=!SUPPRESS_$1
desc=check if host $1 is alive
action=write - No syslog messages from $1; create SUPPRESS_$1 10800
ptype2=regexp
pattern2=^\S+ $1\s
desc2=syslog messages received from %1
action2=none
window=300

The above ruleset is more efficient that creating context on each event,
since it only creates waiting operations once a minute which will terminate
almost instantly (provided you have a constant message flow coming in from
servers). Also, once malfunctioning host has been detected, the monitoring
of this host will be disabled for 3 hours. Finally, you don't have the
downside of potentially missing some malfunctioning hosts if sec needs to
be taken down temporarily.

If you find it difficult to manually configure all hostnames in the
Calendar rule, you can replace the "event" actions with "spawn" in Calendar
rule. The "spawn" action could run any script which automatically obtains
the list of hostnames and writes synthetic events to standard output
(standard output gets picked up by sec). For example, if the "action" field
of the Calendar rule is set to "action=spawn cat /tmp/events" and
/tmp/events contains the lines "KEEPALIVE_host1" and "KEEPALIVE_host2",
checks are conducted for host1 and host2 in the same way as in the previous
solution. If you have many hosts which are stored in a database, using
"spawn" for extracting relevant hostnames with a script is probably a lot
more flexible that to hardcode hundreds of hostnames into Calendar rule
manually.

Finally, this task can also be approached from a different angle. If syslog
events from your hosts are stored into different sec input files on
hostname basis, you could simply take advantage of the --input-timeout and
--timeout-script command line options. These command line options allow to
run an external script if no new data have arrived into an input file
during a predefined number of seconds. If each host has its own log file
which is monitored by sec, this is probably the easiest solution for
getting an alert about malfunctioning host.

Hope this helps,
risto





2018-04-06 19:36 GMT+03:00 Kagan, Eli <eli.ka...@dxc.com>:

> Hi there,
>
>
>
> I am trying to make sure I keep receiving a constant stream of events
> coming in from syslog and alert me in case it stops. The trivial approach I
> think would be to create a context and keep recreating it for every event I
> get. Something like this in the beginning of a ruleset:
>
>
>
> type=Single
>
> ptype=RegExp
>
> pattern=^\S+ (?<host>\S+)
>
> continue=TakeNext
>
> desc=$0
>
> action=create KEEPALIVE_$+{host} 15 ( event 0 HOST STOPED REPORTING:
> $+{host} )
>
>
>
> Now, I am a little bit worried about the performance impact this might
> have. I have a couple of dozen hosts that report about 20 million events
> per day all together.  Wouldn’t this negatively affect the overall
> performance, since I’ll be re-creating a context for each event.
>
>
>
> Is there a better approach to make sure syslog events keep flowing?
>
>
>
> Thanks,
>
> Eli
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Simple-evcorr-users mailing list
> Simple-evcorr-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users
>
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Re: [Simple-evcorr-users] keepalive

Reply via email to