2014-07-25 21:49 GMT+03:00 Yuheng Du <[email protected]>:

> >from your previous mails, I got an impression that you just want to match
> two consecutive events. Do you actually want to have a rule for >detecting
> if a machine fails to send a keepalive message after 10 seconds from
> previous message?
>
> Yes. I want a rule for detecting if a machine fails to send a keepalive
> message after 10s from previous message. The machine is identified by its
> "deploymentId".
>
> >Do you want to get the notification when the keepalive is missing, or
> also for *every* successfully received keepalive?
>
> I want to get notified only when keepalive is missing.
>

if you have many nodes and you want the shortest and simplest solution, you
can try the following:

type=single
ptype=regexp
pattern=\"deploymentId\"\s+=>\s+(\S+)deployment#(\S+)\",
desc=match keepalive
action=create KEEPALIVE_$2 10 ( write - keepalive for $2 not received )

Each time a message comes in from some node, a context is set up which
exists for 10 seconds. If the context expires (this happens when more than
10 seconds have elapsed since its creation), a warning message is written
to standard output. The context can only expire for the node if no messages
have been received for this node during >10 seconds, since each message
recreates the context again, setting its lifetime to 10 seconds from the
current moment. If occasionally the interval between messages can be few
seconds larger (due to message transmission lags, for example), you can set
the context lifetime to a somewhat larger value (like 15 seconds).

The rule above has few drawbacks - for example, if an important node has
stopped transmitting messages before you start sec, you will never know
about it. To fix this problem, you could trigger keepalive checks
explicitly from the Calendar rule, for example:

type=calendar
time=* * * * *
desc=trigger keepalive check for critical nodes
action=event keepalive_check_srb_2; \
       event 10 keepalive_check_srb_2; \
       event 20 keepalive_check_srb_2; \
       event 30 keepalive_check_srb_2; \
       event 40 keepalive_check_srb_2; \
       event 50 keepalive_check_srb_2

type=pairwithwindow
ptype=regexp
pattern=keepalive_check_(\S+)
desc=keepalive check for $1
action=write - keepalive for $1 not received
ptype2=regexp
pattern2=\"deploymentId\"\s+=>\s+\S+deployment#$1\",
desc2=keepalive received for %1
action2=none
window=10

Another question which is left open is how to keep the keepalive state
across sec restarts. If you take the previous Single rule which sets up
contexts, you can save/restore these contexts at shutdowns/restarts. Here
is the relevant FAQ entry: http://simple-evcorr.sourceforge.net/FAQ.html#15

hope this helps,
risto


> Thanks.
>
> best,
>
> Yuheng
>
>
> On Fri, Jul 25, 2014 at 2:20 PM, Risto Vaarandi <[email protected]>
> wrote:
>
>> 2014-07-24 19:13 GMT+03:00 Yuheng Du <[email protected]>:
>>
>>> Hi guys,
>>>
>>> I want to do a correlation between event so If I heard/not heard a
>>> message coming from the same machine within 10s, I need to got notified.
>>>
>>
>> From your previous mails, I got an impression that you just want to match
>> two consecutive events. Do you actually want to have a rule for detecting
>> if a machine fails to send a keepalive message after 10 seconds from
>> previous message? Do you want to get the notification when the keepalive is
>> missing, or also for *every* successfully received keepalive?
>> BR,
>> risto
>>
>>
>>> I am using an EventGroup rule to do this:
>>>
>>> type=EventGroup
>>> ptype=RegExp
>>> thresh=2
>>> window=10
>>> pattern=\"deploymentId\"\s+=>\s+(\S+)deployment#(\S+)\",
>>> desc=CHECK_INTERVAL_$2
>>> action=assign %deploymentId $2;\
>>>        create deploymentId_$2;\
>>>        create DEPLOYMENTID_CONTEXT;\
>>> write - $2 heart beats heard within 10s.
>>> slide=reset 0 %s;
>>> end=write - $2 not heard for 10s since last receive event.;\
>>>     create $2_HEARTBEAT_TIMEOUT;\
>>>     event $2 not heard for 10s.
>>>
>>> However, the pattern can only identify messages coming form ANY
>>> deploymentId, while I want it to identify any messages coming from a
>>> SPECIFIC deploymentId.
>>> like in:
>>>
>>> "deploymentId" => deployment#srb_2",
>>> "deploymentId" => deployment#srb_4",
>>> "deploymentId" => deployment#srb_2",
>>>
>>> I only want to correlate messages coming from srb_2 alone or srb_4 alone.
>>>
>>> Anyone have a suggestion how I can do it with eventgroup rule?
>>>
>>> Or I should just switch to single/singlewiththreshold method as John
>>> suggested in list
>>> http://sourceforge.net/p/simple-evcorr/mailman/message/32640664/ ?
>>>
>>> Thanks!
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Want fast and easy access to all the code in your enterprise? Index and
>>> search up to 200,000 lines of code with a free copy of Black Duck
>>> Code Sight - the same software that powers the world's largest code
>>> search on Ohloh, the Black Duck Open Hub! Try it now.
>>> http://p.sf.net/sfu/bds
>>> _______________________________________________
>>> Simple-evcorr-users mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users
>>>
>>>
>>
>
------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Simple-evcorr-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Reply via email to