hi Peter,
I did more testing on Centos7 and was able to run into the same issue.
Unfortunately, this problem reappeared once even with "SendSIGKILL=no"
setting :-( It appears that a similarly looking bug has been reported
before for an earlier version of systemd, but that was a while ago:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1448259
Just wondering if the same thing has resurfaced...
kind regards,
risto


2017-07-17 19:28 GMT+03:00 Peter Eckel <li...@eckel-edv.de>:

> Hi Risto,
>
> > that is an interesting problem. Let me ask the following question -- is
> the restart done via system init script?
>
> actually it's done via systemd as we are talking about a CentOS 7 system.
> However, the TERM signal should be followed by a KILL only after 60 seconds
> (defined by TimeoutStopSec), which should be more than enough for the task
> to finish:
>
> > [Unit]
> > Description=SEC Simple Event Correlator
> > After=syslog.target
> >
> > [Service]
> > Type=forking
> > User=sec
> > Group=sec
> > PIDFile=/opt/sec/var/run/sec.pid
> > EnvironmentFile=-/etc/sysconfig/sec
> > ExecStart=/opt/sec/local/bin/sec -detach -pid=/opt/sec/var/run/sec.pid
> $SEC_OPTIONS
> > ExecReload=/bin/kill -ABRT $MAINPID
> > ExecStop=/bin/kill -TERM $MAINPID
> > KillMode=process
> > TimeoutStopSec=60
> >
> > [Install]
> > WantedBy=multi-user.target
>
> I doubt (though I haven't investigated in depth yet) that there should be
> a problem.
>
> > As can be seen, the second rule blocks the execution of sec for 5
> seconds (since sec is single-threaded). When trying to shut down sec
> directly from command line by sending it the TERM signal, I got the
> following log messages:
>
> [...]
>
> The interesting part is that in my case it isn't working on CentOS 7
> either.
>
> In fact, even if the task would take more than three seconds to finish
> (which it doesn't, it's more something in the order of less than 100 ms), I
> would at least the desc to appear in the log as I included a logonly as the
> first statement. But no, not even that.
>
> > However, when testing the restart of sec on Centos6 platform with
> "/etc/init.d/sec restart", the second rule was not allowed to finish, but
> the new process was started 3 seconds after TERM signal was received.
> Therefore, it seems that different platforms handle the restart of a daemon
> differently, and on some platforms KILL signal is used after a specific
> timeout. Maybe you are experiencing a similar subtle caveat here?
>
> If that was the case, it would be a problem in systemd as it would ignore
> the TimeoutStopSec setting. But in fact it's even worse :-)
>
> I just attached strace to the sec process and restarted using systemctl ...
>
> > +++ killed by SIGKILL +++
>
> immediatly. Oops. That's not what 'ExecStop=/bin/kill -TERM $MAINPID' was
> supposed to mean :-) At least it explains why my child processes don't run,
> in fact they don't even have the time to start at all.
>
> From the systemd.kill manpage:
>
> > Processes will first be terminated via SIGTERM (unless the signal to
> send is changed via KillSignal=). Optionally, this is immediately followed
> by a SIGHUP (if enabled with SendSIGHUP=). If then, after a delay
> (configured via the TimeoutStopSec= option), processes still remain, the
> termination request is repeated with the SIGKILL signal (unless this is
> disabled via the SendSIGKILL= option). See kill(2) for more information.
>
> So it's probably not what systemd is supposed to do either.
>
> Obviously systemd sends SEC a KILL instead of trying a TERM first, which
> then causes the rules not to work as they were supposed to. On the other
> hand, sending the process a TERM works:
>
> > --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=10668, si_uid=0}
> ---
> > rt_sigreturn()                          = -1 EINTR (Interrupted system
> call)
> > rt_sigprocmask(SIG_BLOCK, [TERM], [], 8) = 0
> > rt_sigaction(SIGTERM, NULL, {0x7f6f2f45afa0, [], SA_RESTORER,
> 0x7f6f2e43b370}, 8) = 0
> > rt_sigprocmask(SIG_BLOCK, [TERM], [TERM], 8) = 0
> > rt_sigaction(SIGTERM, {0x7f6f2f45afa0, [], SA_RESTORER, 0x7f6f2e43b370},
> {0x7f6f2f45afa0, [], SA_RESTORER, 0x7f6f2e43b370}, 8) = 0
> > rt_sigprocmask(SIG_SETMASK, [TERM], NULL, 8) = 0
> > rt_sigprocmask(SIG_UNBLOCK, [TERM], NULL, 8) = 0
> > [...]
> > nanosleep({3, 0}, 0x7ffdc4091550)       = 0
> > [...]
> > +++ exited with 0 +++
>
> The good news is that it's not SEC's fault. I'll look into the systemd
> behaviour now ... maybe I can find a solution. I'll keep you updated.
>
> Thanks and best regards,
>
>   Peter.
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Reply via email to