hi Peter,
that is an interesting problem. Let me ask the following question -- is the
restart done via system init script? If so, the behavior you are observing
might be caused by the init script -- it initially sends a TERM signal to
the sec process which is then followed by KILL, since the process has not
died immediately as the init script expected. In order to test it, I wrote
a small test ruleset:

type=single
ptype=substr
pattern=SEC_SHUTDOWN
context=SEC_INTERNAL_EVENT
continue=takenext
desc=test for TERM timeout
action=write - received SEC internal event SHUTDOWN

type=single
ptype=substr
pattern=SEC_SHUTDOWN
context=SEC_INTERNAL_EVENT
desc=test for TERM timeout
action=lcall %o -> ( sub { sleep(5) } )

As can be seen, the second rule blocks the execution of sec for 5 seconds
(since sec is single-threaded). When trying to shut down sec directly from
command line by sending it the TERM signal, I got the following log
messages:

Mon Jul 17 16:45:46 2017: SIGTERM received: shutting down SEC
Mon Jul 17 16:45:46 2017: Creating SEC internal context 'SEC_INTERNAL_EVENT'
Mon Jul 17 16:45:46 2017: Creating SEC internal event 'SEC_SHUTDOWN'
Mon Jul 17 16:45:46 2017: Writing event 'received SEC internal event
SHUTDOWN' to file '-'
Mon Jul 17 16:45:46 2017: Calling code 'CODE(0x1ce60d0)' and setting
variable '%o'
Mon Jul 17 16:45:51 2017: Variable '%o' set to '5'
Mon Jul 17 16:45:51 2017: Deleting SEC internal context 'SEC_INTERNAL_EVENT'

>From the log it can be seen that SEC_SHUTDOWN event was processed normally
and the second rule waited for 5 seconds without interruptions. When
testing restarting sec-2.7.11 on Centos7 with "systemctl restart sec",
expected behavior was again observed (the second rule waits for 5 seconds,
and there is also additional 3 second window before new process is started):

Mon Jul 17 16:47:30 2017: SIGTERM received: shutting down SEC
Mon Jul 17 16:47:30 2017: Creating SEC internal context 'SEC_INTERNAL_EVENT'
Mon Jul 17 16:47:30 2017: Creating SEC internal event 'SEC_SHUTDOWN'
Mon Jul 17 16:47:30 2017: Writing event 'received SEC internal event
SHUTDOWN' to file '-'
Mon Jul 17 16:47:30 2017: Calling code 'CODE(0x13ea4a0)' and setting
variable '%o'
Mon Jul 17 16:47:35 2017: Variable '%o' set to '5'
Mon Jul 17 16:47:35 2017: Deleting SEC internal context 'SEC_INTERNAL_EVENT'
Mon Jul 17 16:47:38 2017: SEC (Simple Event Correlator) 2.7.11

However, when testing the restart of sec on Centos6 platform with
"/etc/init.d/sec restart", the second rule was not allowed to finish, but
the new process was started 3 seconds after TERM signal was received.
Therefore, it seems that different platforms handle the restart of a daemon
differently, and on some platforms KILL signal is used after a specific
timeout. Maybe you are experiencing a similar subtle caveat here?

kind regards,
risto


2017-07-17 15:48 GMT+03:00 Peter Eckel <li...@eckel-edv.de>:

> Hi Risto,
>
> I think I found a bug (or a mistake in the documentation) in SEC.
>
> The manpage says:
>
> > SEC_SHUTDOWN  -  generated  when  SEC receives the SIGTERM signal, or
> when SEC reaches all EOFs of input files after being started with the
> --notail option. With the --childterm option, SEC sleeps for 3 seconds
> after generating SEC_SHUTDOWN event, and then sends SIGTERM to its child
> processes (if a child process was triggered by SEC_SHUTDOWN, this delay
> leaves the process enough time for setting a signal handler for SIGTERM).
>
> However, when I restart sec (i.e. stop and immediatels start the process),
> I get:
>
> > Mon Jul 17 12:36:36 2017: SIGTERM received: shutting down SEC
> > Mon Jul 17 12:36:36 2017: Creating SEC internal context
> 'SEC_INTERNAL_EVENT'
> > Mon Jul 17 12:36:36 2017: Creating SEC internal event 'SEC_SHUTDOWN'
> > [...]
> > Mon Jul 17 12:36:36 2017: SEC (Simple Event Correlator) 2.7.11
> > Mon Jul 17 12:36:36 2017: Changing working directory to /
>
> No sign of any 3 second delay ...
>
> I found this because I use two rules triggering on SEC_SHUTDOWN. Neither
> of them runs more than 3 seconds (far less, in fact), but depending on the
> order I have them processed, only one of them gets executed. The one that
> runs slightly longer, if run first, keeps the second from executing
> altogether. No log entry, no error, nothing - it isn't started at all.
>
> When I replace both actions with a logonly, both run fine irrespectively
> of the order, so it's quite clearly a matter of timing. However, with the 3
> second delay it should work in any case, 3 seconds should be ample of time
> for both to complete.
>
> I must add that both are calling perl functions, no shellcmd whatsoever.
> One is called via lcall, the other via call.
>
> Best regards,
>
>   Peter.
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Simple-evcorr-users mailing list
> Simple-evcorr-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Reply via email to